A COMPARISON OF SMALL AREA AND CALIBRATION ESTIMATORS VIA SIMULATION

Size: px

Start display at page:

Download "A COMPARISON OF SMALL AREA AND CALIBRATION ESTIMATORS VIA SIMULATION"

Antony Anthony
5 years ago
Views:

1 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 133 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY Joint Issue: Small Area Estimation 014 Vol. 17, No. 1, pp A COMPARISON OF SMALL AREA AND CALIBRAION ESIMAORS VIA SIMULAION M. A. Hiiroglou 1, V. M. Estevao ABSRAC Domain estimates are typically obtaine using calibration estimators that are irect or moifie irect. hey are irect if they strictly use ata within the omain of interest. hey are moifie irect if they use both ata within an outsie the omain of interest. An alternative way of proucing these estimates is through small area proceures. In this article, we compare the performance of these two approaches via a simulation. he population is generate using a hierarchical moel that inclues both area effects an unit level ranom errors. he population is mae up of mutually exclusive omains of ifferent sizes, ranging from a small number of units to a large number of units. We select many inepenent simple ranom samples of fixe size from the population an compute various estimates for each sample using the available auxiliary information. he estimates compute for the simulation inclue the Horvitz-hompson estimator, the synthetic estimator (inirect estimate), calibration estimators, an unit level base estimators (small area estimate). he performance of these estimators is summarize base on their esign- base properties. Key wors: area level, unit level, calibration estimates, small area estimates, simulation. 1. Introuction Domain estimates at Statistics Canaa are typically obtaine using well-establishe methos base on calibration estimation. he calibration is irect or moifie irect. It is irect if it is base on ata within the omain of interest. It is moifie irect if it is base on ata within an outsie the omain of interest. hese methos can be viewe as esign-base proceures as the variance of the resulting estimators is evaluate uner the ranomization istribution. he 1 Michael A. Hiiroglou, Statistical Research an Innovation Division, Statistics Canaa, 16 D, R.-H.-Coats Builing, Ottawa, Ontario, Canaa, K1A hiirog@yahoo.ca. Victor M. Estevao, Statistical Research an Innovation Division, Statistics Canaa, 16 D, R.-H.-Coats Builing, Ottawa, Ontario, Canaa, K1A Victor.Estevao@statcan.gc.ca.

2 134 M. A. Hiiroglou, V. M. Estevao: A Comparison of small... ranomization istribution of an estimator is the istribution over all possible samples that coul be selecte from the target population of interest uner the sampling esign use to select the sample, with the population parameters consiere as fixe values. Another way of proucing these estimates is through small area methos. hese methos are particularly important when the sample size in the omains is small. hey can improve the reliability of the irect estimates provie that the variable of interest is well correlate with auxiliary variables x that are available from aministrative or other files. Small area estimation essentially combines irect estimates with moel-base estimates in an optimal manner. he moel-base estimates involve known population totals (auxiliary ata) an estimates of the regression between the variable of interest an the auxiliary ata across the small areas. In general, these moels are classifie into two groups: unit level moels an area level moels. Unit level moels are generally base on observation units (e.g., persons or companies) from the survey an auxiliary variables associate with each observation, whereas area level moels are base on irect survey estimates aggregate from the unit level ata an relate area level auxiliary variables; see Rao (003) for an overview of small area moels. he more recent literature that covers empirical assessment of the properties of various small area an omain estimators inclues Lehtonen an Veianen (009), Datta (009), an Pfeffermann (013). Lehtonen an Veianen (009) focuse on esign-base methos (calibration an regression) using auxiliary ata. hey reviewe the work on the extension of the linear form of the Generalize Regression Estimator (GREG) given in Särnal et al. (199) to inclue logistic, multinomial logistic an mixe moels for omain estimation. Datta (009) reviewe the evelopment of moel-base proceures to obtain small area estimates. Datta focusse in particular on the theoretical properties of the resulting estimators. Pfeffermann (013) reviewe both esign-base an moel-base proceures, as well as recent evelopments in these two proceures. Domain estimates are currently obtaine via esign-base proceures at Statistics Canaa. However, the increasing requirement for proucing estimates for "small omains" has encourage the nee to aopt moel-base proceures. A SAS-base prototype (Estevao et al. 014) has been recently evelope at Statistics Canaa to respon to these requirements. he prototype currently incorporates two well-known methos initially evelope by Fay an Herriot (1979) for area level estimation, an Battese, Harter, an Fuller (1988) for unit level estimation. Although the theoretical properties of the estimators inclue in the prototype are known, they were investigate via a simulation. In the simulations, we looke at the properties of estimators of omain totals. We compare moel-base small area estimators with traitional estimators through simulation. he latter inclue the Horvitz-hompson estimator, two calibration estimators, the moifie regression estimator an the synthetic estimator. he small area estimators are the EBLUP an Pseuo EBLUP estimators base on a unit level moel. More etails on all of these estimators are given in section.

3 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 135 he simulation setup an results are reporte in section 3. Section 4 provies a few conclusions from our finings.. Sample esign Large scale surveys are esigne to satisfy reliability requirements for some subsets (omains) of the population. Examples of these subsets inclue partitions below the level of the initial geographical / inustrial etail requeste by the client. If such subsets are require before the sample is selecte, then such omains are labelle as planne omains (Singh, Gambino an Mantel 1994). Such planne omains will have some of the sample allocate to them to obtain unbiase estimates with the require precision using irect estimation proceures. If these omains are ientifie after the sample has been selecte, they will be known as unplanne omains. Note that, in any event, unplanne omains will exist for most surveys. An example taken from business surveys is a change of inustry uring ata collection. A business initially classifie as inustry A becomes inustry B. Such a business woul be tabulate as part of the businesses of type B, but woul retain its original sampling weight. Another example, taken from househol surveys, woul be the arbitrary prouction of estimates below a geographical level that was not part of the allocation process of the sample. raitional or small area estimators can be use for either planne or unplanne omains. As omain estimation for most surveys at Statistics Canaa is mostly of the unplanne type, we have esigne our simulation to reflect this tenency: that is no units are allocate to them prior to sample selection. Domain estimates are prouce after sample selection, an the number of sample units falling in each omain is a ranom variable. Our simulation reflects this point, an we use the simplest sample esign to carry it out. We rew repeate samples s of size n from the population U of size N using simple ranom sampling without replacement. he weight associate with unit U is enote as w. Let s, 1,,..., D, be the portion of the sample s that overlaps with omain U (of known size N ). Let the realize sample size in omain U be n. he survey esign weight associate with a unit U is w. he ata in the population are enote as y, x for each element U. he y variable is the one of interest, while x is the vector of auxiliary ata. Computation of omain statistics can be obtaine using the operators (i.e.: mean an variance) in regular estimation via the following transformation. In omain U, we enote the variable of interest as y where y y if U an 0 otherwise. he associate vector of auxiliary variables is efine as x where x x if U an 0 otherwise.

4 136 M. A. Hiiroglou, V. M. Estevao: A Comparison of small... he obective of the present stuy is to compare the properties of moel-base small area estimators for omains with those traitionally use in survey estimation. We consiere seven estimators of the omain total Y U y : four are traitional estimators an three are small area estimators. We first present the traitional estimators..1. raitional estimators Horvitz-hompson: he Horvitz-hompson estimator Y ˆ H, 1,,..., D, uses no auxiliary information. It is efine as Y w y if n 0 an 0 ˆ H s otherwise. We set Y ˆ H to 0 if there are no sample units in the omain, ensuring unbiase estimation over all samples s rawn from U. Although this estimator is unbiase, it prouces inefficient estimates. Calibration Estimators: We consier two calibration estimators, Y ˆ CALU an Y ˆ CALU, that use auxiliary information at ifferent levels. hey are applications of calibration given in Deville an Särnal (199) aapte to omain estimation. he irect estimator Y ˆ CALU uses auxiliary information at the omain level, while the moifie irect estimatory ˆ CALU uses information at the population level. Estimator Y ˆ CALU is known to be more efficient than Y ˆ CALU. However, estimator Y ˆ CALU has some rawbacks. It is not always possible to obtain auxiliary information at the omain level. Even if this information is available, we cannot prouce estimates using Y ˆ CALU if there are no sample units in the omain. Furthermore, this estimator can prouce erratic values when there are only a few units in the omain. o prevent this, we nee to make sure that the number of units in the omain is larger than the number of auxiliary variables. As a minimal requirement, given that there are two auxiliary variables (intercept, x), Y ˆ CALU can be estimate only if there are 3 or more units in a omain. Otherwise, we cannot prouce a value, an we set it to missing. his means that we only work with a subset of all possible samples. If we set the value of Y ˆ CALU to 0 when there is an insufficient number of observations n in omainu, this woul result in a biase estimator. As for Y ˆ CALU, when there are no sample units in the omain, we set the value of this estimator to 0. his ensures that it is

5 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 137 approximately esign unbiase for the omain total. Estimator 1,,..., D, is given by: Y ˆ CALU, ( ˆ w ) ˆ y H CALU if n 3 ˆ X X β Y s CALU. (missing) if n 3 with X x, ˆ X H w x an U s 1 w ˆ w y β CALU x x s c x. s c Estimator Y ˆ CALU, 1,,..., D, is given by: with Yˆ CALU ( ˆ w ) ˆ y H CALU n 0 X X β if s 0 if n 0 X x, ˆ H U s X w x an 1 w ˆ x x w x y β CALU s c. s c Moifie Regression (REG): he moifie regression estimator Y ˆ REG, 1,,..., D, is ue to Wooruff (1966). It is of interest as it was use to prouce area breakowns of the monthly national estimates of US Census Bureau retail trae survey. Note that it is a moifie irect estimator. Singh an Mian (003) points out that it can be viewe as a calibration estimator s w y, where the calibration weight w is obtaine by minimizing the chi-square istance / x X : here a is s c w a w w, subect to the constraints s w the omain inicator variable. Estimator Y ˆ REG is esign-unbiase as the overall sample size increases. It is given by: Yˆ REG ( ˆ w ) ˆ y H REG if n 0 X X β s ˆ X βreg if n 0

6 138 M. A. Hiiroglou, V. M. Estevao: A Comparison of small... with X x, ˆ X H w x an U s 1 w ˆ x x w x y β REG s c. s c he c term, c 0, associate with the estimators that use auxiliary ata reflects that the error terms inepenently with mean zero an variance c... Small area estimators β ˆ e in the implie working moel are istribute he simplest small area estimator is the synthetic estimator (SYN), Y ˆ SYN, 1,,..., D. It is given by ˆ Y ˆ SYN X β SYN where X U x an SYN s Bias ( Yˆ ) 1 x x x w w y c s c SYN Y regression vector. X B, where e. his estimator is esign-biase, given by 1 x y x x B U c is the population U c he next two small area estimators are base on a hierarchical moel given by: y x β v e, (1) where ii v ii e v N(0, ), e N(0, c ), an caccounts for possible heterogeneity of the e resiuals. In our application of this moel, the areas are our omains of interest. he quantity xβ is the fixe effect which is assume to be a linear combination of the auxiliary variables x i. he resiuals v an e are respectively the ranom effect for the area an the ranom errors for unit in area. he term translates to a c in the various formulas that follow. c

7 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 139 Empirical Best Linear Unbiase Preictor (EBLUP): his estimator enote Y, 1,,..., D, is given in Rao (003, p.136). It is an extension of the as ˆ EBLUP Battese, Harter, an Fuller (1988) estimator when the error structure of the resiuals is not homogeneous. It is given by: Yˆ EBLUP { ˆ N ˆ ( ˆ X βeblup a ya xa βeblup )} if n 0 ˆ X βeblup if n 0 he terms making up Y ˆ EBLUP inclue N, terms are efine as follows: ˆ a ˆ v v ˆe a ˆ given by: y a where 1 a X, ˆa a y s s a s a ya, x a x a, an β ˆEBLUP. hese s a s a x, an. he estimate regression vector is 1 D D EBLUP a ( a a ) a ( a a ) y 1 s 1 s βˆ x ˆ x x x ˆ x. his estimator is not esign consistent, unless the sampling esign is selfweighting. Pseuo-EBLUP (PEBLUP): his estimator enote as Y ˆ PEBLUP, 1,,..., D, is an extension of the Pseuo-EBLUP estimator given in You an Rao (00). It accounts for the heterogeneity of the e resiuals in moel (1). It inclues the survey weights w, s, in the regression coefficient an the parameter estimate. { ˆ ˆ ( )} 0 ˆ N X βpeblup w yw xw yw if n Y PEBLUP ˆ X βpeblup if n 0 he terms making up Y ˆ PEBLUP inclue N, X, w hese terms are efine as follows: ˆ w ˆ v v ˆe w ˆ, where w y w s s w s w s a y, x w wy w,, y w, an β ˆPEBLUP. s w x x w, w for 1,,..., D. s

8 140 M. A. Hiiroglou, V. M. Estevao: A Comparison of small... he estimate regression vector is given by: 1 D D PEBLUP w a ( wa wa ) i w a ( wa wa ) y 1 s 1 s βˆ x ˆ x x x ˆ x s w a x ˆ with x wa an ˆ v wa w a ˆ ˆ where wa s s s ( wa ) a wa his estimator is esign consistent.. v e wa 3. Simulation Surveys prouce at Statistics Canaa can be as simple as stratifie one stage simple ranom sampling esigns typically use for business surveys to the more complex stratifie multi-stage esign with unequal selection probabilities at each stage typically use for househol surveys. We opte for a single stage simple ranom sample selecte from the population, as it is a simplification of the sample esigns use for business surveys. Ha we chosen a sampling esign with unequal weights, we woul have ha to account for the possible impact of informative sampling on the small area estimators using the proceure given in Pfeffermann an Sverchkov (007). Verret, Rao, an Hiiroglou (015) use a simpler proceure than the one given in Pfeffermann an Sverchkov (007). heir proceure accounte for unequal selection probabilities for moel-base small area estimators by incorporating them into the moel. heir simulation use a esign-moel (pm) approach. heir results showe that incorporating the unequal selection probabilities significantly improve the performance (average absolute bias an average RMSE) of EBLUP, but ha marginal impact on PEBLUP. 3.1 Population Generation an Sample Selection A population U consisting of 4,640 units was create by generating ata ( x, y ) for three separate subsets of the population (groups) with ifferent i i intercepts an slopes. Each group was split into mutually exclusive an exhaustive omains as follows: Group 1 was split into nine omains U 1,..., U 9 ; Group was split into ten omains U10,..., U 19 ; an Group 3 was split into ten omains U0,..., U 9. he three groups resulte in a total of D=9 omains that were mutually exclusive an exhaustive. he number of units in each omain, N, was

SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 141 allocate in a monotonic manner: omain U 1 ha 0 units; omain U ha 30 units; an omain U 9 ha 300 units.

he secon one, x, represente the available auxiliary ata in the population. he auxiliary variable x in each group was generate from a Gamma ( 5, 10) istribution with mean 50 an variance 500.

9 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 141 allocate in a monotonic manner: omain U 1 ha 0 units; omain U ha 30 units; an omain U 9 ha 300 units. In our simulation the auxiliary ata x consiste of two auxiliary variables. he first one ha the fixe value of one to represent the intercept in the moel. he secon one, x, represente the available auxiliary ata in the population. he auxiliary variable x in each group was generate from a Gamma ( 5, 10) istribution with mean 50 an variance 500. he variable of interest y was generate using the moel y 0, 1, x v e : =1,,3 () where ii v v N(0, ) an ii e e N(0, c ). We use v e an set c equal to x. he following table summarizes how the population was split into the three groups of omains. able 1. Groups, associate omains an regression parameters Group ( ) Domains in Group 0, 1, 1 U for = 1,..., U for =10,..., U for =0,..., A plot of the generate population is shown in Figure 1. he units in the groups are shown respectively in green, blue an yellow. he three regression lines are shown in re. Without the colours to ientify the groups, one might be incline to think that the population was generate uner a moel with a single auxiliary variable (one intercept an slope) as shown in the inset. Figure 1. Plot of y vs. x for population in the simulation stuy

10 14 M. A. Hiiroglou, V. M. Estevao: A Comparison of small... We ran two separate simulation runs to reflect that two possible moels coul be fitte for the selecte samples. We enote these simulations as runs 1 an. In the first run (simulation run 1), we assume that the moel coul be fitte using x (1, x ) as auxiliary ata; this is not correct as the population was generate on the basis of three ifferent regressions. In the secon run (simulation run ), we acknowlege that there were three separate moels an use a set of auxiliary variables reflecting the manner in which the population values were generate; this fit is correct. his meant using a set of ummy-coe auxiliary variables efine as follows for each unit: x 1,0,0, x,0,0 if U group 1 0,1,0,0, x,0 if U group 0,0,1,0,0, x if U group 3 In the small area estimation moel given by equation (1), the use of this implies the following regression coefficient 1,, 3, 4, 5, 6 (3) x β for the fixe effects. For the synthetic estimator an the calibration estimators, we set c x to reflect the heterogeneity of the moel errors. i i Each simulation run involve the selection of R=100,000 inepenent samples an the computation of various estimates for each sample. Each sample was a simple ranom sample s of size n selecte without replacement from U. We use sample sizes n=3 (5%), n=464 (10%), n=696 (15%) an n=98 (0%), where the sampling fractions are inicate in brackets. hese are within the range of the sampling fractions typically use by business surveys. he sample units in omain U are enote by s with s 1 s. We D observe n units in U where 0 n N an n 1 n. Uner simple ranom sampling without replacement, the n follow a multivariate D N N hypergeometric istribution with probability mass function 1. n n he following table shows the probability of observing n 0, n 1 or n in the three smallest omains when the sample size n is 3. D

11 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 143 able 4. Probabilities in the 3 smallest omains when n = 3 Probability U 1 with N1 0 U with N 30 U 3 with N3 40 Prob ( n 0) Prob ( n 1) Prob ( n ) Using table 4, for the smallest omain U 1, Y ˆ H an Y ˆ CALU woul be equal to zero about 36% of the time. Note that this probability ecreases rapily as the omain population size N increases. Since we require n 3, we cannot prouce an estimate for Y ˆ CALU in approximately 9.5% of the samples selecte in the smallest omain U 1. his probability ecreases rapily as the omain population size N increases. 3.. Simulation statistics For each selecte sample in each simulation run r = 1,...,R (R=100,000), we ( ) compute estimates of Y for the seven estimators. Denote Y ˆ r ES as the estimate th prouce for the r sample, r 1,,... R, where the subscript ES is a placeholer for any one of the seven estimators. For each omain =1,...,9, we compute the bias as: an the mean square error as Bias ˆ 1 ( Y ) R Y ˆ Y ES R ( r) r1 ES R ˆ r r 1 ES MSE ( Y ˆ ) R 1 Y ( ) Y. ES For each estimator, Y ˆ ES, we also compute the following summary statistics across all omains an simulate samples. hese were the average absolute relative bias, the average coefficient of variation an the average relative efficiency enote as ARB( Y ˆ ), CV ( Y ˆ ) an RE( Y ˆ ) respectively. ES ES ES

12 144 M. A. Hiiroglou, V. M. Estevao: A Comparison of small... hese were compute as follows: ˆ 1 D ARB( Y ) ˆ ˆ ES 1 ARB( Y ES ) where ARB( Y ES ) D ˆ 1 D CV ( Y ) ˆ ˆ ES 1CV ( Y ES ) where CV ( Y ES ) D Bias( Yˆ ) ES MSE( Yˆ ) ( ˆ ˆ MSE Y ) ˆ 1 RE( Y ) where MSE( Y ) MSE( Yˆ ) he statistic H D ES ˆ ES 1 ES MSE( YES ) D Y Y ES RE( Y ˆ ) measures the average efficiency of each estimator ES relative to the Horvitz-hompson estimator. Since Y ˆ H is known to have the least efficiency among these seven estimators, this measure is a number larger than or equal to Simulation Results ables 5, 6 an 7 show the ifferences between the two runs using the summary statistics escribe in the previous section. he results are iscusse after each of these three tables for runs 1 an. able 5. Average Absolute Relative Bias ARB( Y ˆ ) ES (4) raitional Domain Estimators Small Area Estimators Sample Size Run Y ˆ ˆH Y Y CALU ˆ CALU Y ˆ REG Y ˆSYN Y ˆEBLUP Y ˆPEBLUP

13 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 145 Moel oes not fit (Run 1): he small area estimators Y ˆSYN, Y ˆEBLUP, an Y ˆPEBLUP have the largest ARB s. In particular, Y ˆSYN has the highest ARB. he ARB ecreases as the sample size increases for Y ˆEBLUP an Y ˆPEBLUP, whereas it remains constant (as expecte) for Y ˆSYN. he ARB associate with Y ˆPEBLUP ecreases more rapily than the one associate with Y ˆEBLUP as the sample size increases. he ARB associate with the traitional omain estimators is quite small: it also ecreases as the sample size increases. Moel fits (Run ): he ARB s associate with the small area estimators have significantly ecrease. However, they are still higher than those associate with the traitional omain estimators. Y ˆ REG has the smallest ARB amongst all the estimators. As note in run 1, the ARB ecreases as the sample size increases for all the estimators. able 6. Average Coefficient of Variation CV ( Y ˆ ) ES raitional Domain Estimators Small Area Estimators Sample Size Run Y ˆ ˆH Y CALU Y ˆ CALU Y ˆ REG Y ˆSYN Y ˆEBLUP Y ˆPEBLUP Moel oes not fit (Run 1): Estimators Y ˆH an Y ˆ CALU have the highest CV among all estimators; their CV s are quite comparable, implying that auxiliary

14 146 M. A. Hiiroglou, V. M. Estevao: A Comparison of small... ata use at the population level in Y ˆ CALU has no impact on improving the reliability of the estimator at the omain level. he synthetic estimator Y ˆSYN also has a high CV that remains constant no matter what the sample size is. he calibration estimator Y ˆ CALU has the lowest CV for all sample sizes. he ranking from low to high of the remaining three estimators is Y ˆ PEBLUP, Y ˆEBLUP an Y ˆ REG. Note that the CV of Y ˆ REG ecreases quite rapily as compare to the other estimators. he reliability of all the estimators improves as the sample size increases. Moel fits (Run ): he CV s are smaller than those obtaine in run 1 for all estimators except for Y ˆH an Y ˆ CALU. his is expecte as both estimators o not profit from the auxiliary ata. hese two estimators are still the ones with the highest CV s. As expecte, because the moel fits well, all three small area estimators Y ˆSYN, Y ˆEBLUP an Y ˆPEBLUP have reasonable CV s. he moifie regression estimator Y ˆ REG performs better than the calibration at the omain level Y ˆ CALU : the reverse was true when the moel was incorrect (run 1). able 7. Average Relative Efficiency Sample Size Run raitional Domain Estimators Y ˆ ˆH RE( Y ˆ ) ES Small Area Estimators Y CALU Y ˆ CALU Y ˆ REG Y ˆSYN Y ˆEBLUP Y ˆPEBLUP Note: he higher the number the more efficient the estimator relative to the H estimator. Recall that run 1 represents the results when the moel oes not fit, whereas run represents the results when the moel fits.

15 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 147 Moel oes not fit (Run 1): he ranking of the estimators (from highest RE to lowest RE ) is as follows: Y ˆ CALU, Y ˆPEBLUP, Y ˆEBLUP, Y ˆ REG, Y ˆSYN, an Y ˆ CALU. he traitional omain estimator Y ˆ CALU is oing the best, but it is closely followe by the two small area estimators Y ˆPEBLUP an Y ˆEBLUP. As the sample size increases, there is a ichotomy in terms of RE. he relative efficiency increases for Y ˆ CALU an Y ˆPEBLUP, whereas it ecreases for Y ˆ REG, Y ˆSYN, an Y ˆEBLUP. here is no change to Y ˆ CALU as the auxiliary information is not useful at the omain level. Moel fits (Run ): he ranking of the estimators (from highest RE to lowest RE ) has change with respect to run 1. It is now Y ˆEBLUP Y ˆ REG, Y ˆPEBLUP, Y ˆSYN,. he small area estimators are clearly more, Y ˆ CALU, an Y ˆ CALU efficient than the traitional estimators. he relative efficiency increase for all estimators - maximum is now 14 versus 8 obtaine in run 1. Once more, as the sample size increases, there is a ichotomy in terms of RE. Another way to summarize the behaviour of the various estimators is graphically. We summarize the average absolute relative bias, ARB( Y ˆ ES ), an the average coefficient of variation, CV ( Y ˆ ), within each omain 1,,, D, where D = 9. ES Figures a an b isplay two typical graphs of the absolute relative bias over the omains for the two simulation runs. hese graphs show the results for the sample size of 464. Similar results were obtaine for the other sample sizes. We can see that the absolute relative bias of Y ˆ SYN, Y ˆ EBLUP an Y ˆ PEBLUP is greatly reuce when we specify the correct auxiliary variables in the unerlying moel. In the first run, the small area estimators show a rop an a rise between the groups of omains. his can be explaine. he overall moel fitte using x (1, x ) prouces a regression which is close to the unerlying moel for the secon group of omains. herefore, the ifferences are small for the secon group of omains. However, this overall moel is quite ifferent from the one use to generate the population in the first an thir groups of omains. i i

16 148 M. A. Hiiroglou, V. M. Estevao: A Comparison of small... Moel oes not fit ARB Legen: Y ˆ H : Y ˆ REG : Y ˆ : Y ˆ CALU : CALU Y ˆ SYN : Y ˆ EBLUP : Y ˆ PEBLUP : Figure a. Plots of the absolute relative bias of the estimators for sample size 464 Moel fits ARB Legen: Y ˆ H : Y ˆ REG : Y ˆ : Y ˆ CALU : CALU Y ˆ SYN : Y ˆ EBLUP : Y ˆ PEBLUP : Figure b. Plots of the absolute relative bias of the estimators for sample size 464 Figures 3a an 3b isplay the coefficient of variation associate with the estimators. he coefficient of variation is reuce for all estimators except the H estimator Y ˆ H (which oes not use any auxiliary information) an Y ˆ CALU (because the auxiliary variables for this estimator are equivalent in the two runs).

17 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 149 Moel oes not fit CV Legen: Y ˆ H : Y ˆ REG : Y ˆ : Y ˆ CALU : CALU Y ˆ SYN : Y ˆ EBLUP : Y ˆ PEBLUP : Figure 3a. Plots of the Coefficient of Variation of the Estimators for sample size 464 Moel fits CV Legen: Y ˆ H : Y ˆ REG : Y ˆ : Y ˆ CALU : CALU Y ˆ SYN : Y ˆ EBLUP : Y ˆ PEBLUP : Figure b. Plots of the Coefficient of Variation of the Estimators for sample size 464

18 150 M. A. Hiiroglou, V. M. Estevao: A Comparison of small... Figures 4a an 4b show a graphical isplay of the results for the average coefficient of variation CV ( Y ˆ ES ) results given in able 6. Uner run, we see that Y ˆ SYN, Y ˆ EBLUP an Y ˆ PEBLUP have the smallest CV ( Y ˆ ES ). All three lines are inistinguishable as they are very close together. Uner run, we see that Y ˆ SYN, Y ˆ EBLUP an Y ˆ PEBLUP have the smallest CV ( Y ˆ ES ). All three lines are inistinguishable as they are very close together. Moel oes not fit CV Figure 4a. Plots of the average coefficient of variation of the estimators by sample size Moel fits CV Figure 4b. Plots of the average coefficient of variation of the estimators by sample size

19 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 151 Figures 5a an 5b show a graphical isplay of the average relative efficiency of the estimators given in table 7. Uner run, we note that Y ˆ EBLUP an Y ˆ PEBLUP have the highest Moel oes not fit RE( Y ˆ ) over the various sample sizes. ES RE Figure 5a. Plots of the average relative efficiency of the estimators by sample size Moel fits RE Figure 5b. Plots of the average relative efficiency of the estimators by sample size

20 15 M. A. Hiiroglou, V. M. Estevao: A Comparison of small Conclusions We compare via simulation the behavior of a number of traitional omain an small area estimators. he sampling esign use in the simulation, simple ranom sampling without replacement, is a simplification of the sampling esign commonly use for business surveys (stratifie simple ranom sampling without replacement). he estimators using the auxiliary ata either reflecte the moel use to generate the population (moel fits) or i not (moel oes not fit). he simulation esign i not use unequal probability sampling. he aitional complexity of using unequal probability sampling is that we woul have ha to moify our moel-base small area estimators to account for possible informative sampling. However, since we use simple ranom sampling without replacement, we i not have to account for this problem. he conclusions of our simulation are as follows. Comparing the efficiency between the traitional an small area estimators, the results very much epen on whether the moel hols or not. he calibration estimator Y ˆ CALU which only uses auxiliary ata at the population level is not efficient at the omain level whether the moel hols or not. his is in contrast to Y ˆ CALU that uses auxiliary ata at the omain level. he estimator Y ˆ CALU is the best traitional estimator to use when the moel hols. Its average relative efficiency increases as the overall sample size increases. Its weakness is in the smaller omains, where the expecte sample size is smaller than three units, as it cannot be efine when the auxiliary ata consists of two auxiliary variables; in general, when there are p auxiliary variables, we are not be able to efine Y ˆ CALU when the sample size is smaller than p+1 auxiliary variables. When the moel oes not hol, Y ˆ REG is the best traitional estimator to use. However, it is outperforme by the small area estimatorsy ˆ SYN, Y ˆ EBLUP an Y ˆ PEBLUP. he small area estimator Y ˆ EBLUP is the most efficient one when the moel hols, although it is closely followe by Y ˆ SYN an Y ˆ PEBLUP. When the moel oes not hol, the Y ˆ PEBLUP estimator is the most efficient small area estimator; an explanation for this is that it is esign-consistent.

21 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 153 Acknowlegement We woul like to thank the referee for his constructive comments that substantially improve this paper. REFERENCES BAESE, G. E., HARER, R. M., FULLER, W. A., (1988). An errorcomponent moel for preiction of county crop areas using survey an satellite ata. Journal of the American Statistical Association, 83 (401), DAA, G. S., (009). Moel-base approach to small area estimation. Hanbook of Statistics, 9, DEVILLE, J. C., SÄRNDAL, C. E., (199). Calibration estimation in survey sampling. Journal of the American Statistical Association, 87(418), ESEVAO, V., HIDIROGLOU, M. A., YOU, Y., (014). Methoology Software Library - Small area Estimation Methoology Specifications for Area an Unit Level base Moels. echnical Report, Statistics Canaa. FAY, R. E., HERRIO, R. A., (1979). Estimation of income for small places: an application of James-Stein proceures to census ata. Journal of the American Statistical Association, 74 (366A), LEHONEN, R., VEIJANEN, A., (009). Design-base methos of estimation for omains an small areas. Hanbook of statistics, 9, PFEFFERMANN, D., (013). New important evelopments in small area estimation. Statistical Science, 8(1), PFEFFERMANN, D., SVERCHKOV, M., (007). Small Area Estimation Uner Informative Probability Sampling of Areas an Within the Selecte Areas. Journal of the American Statistical Association 10 (480), RAO, J. N. K., (003). Small Area Estimation: John Wiley & Sons. SINGH, A. C., MIAN, I. U. H., (1995). Generalize Sample Size Depenent Estimators for Small Areas, Proceeings of the 1995 Annual Research Conference, U.S. Bureau of the Census, Washington, DC, SINGH, M. P., GAMBINO, J., MANEL, H., (1994). Issues an Strategies for Small Area Data. Survey Methoology, 0 (1), 3. WOODRUFF, R. S., (1966). Use of a Regression echnique to Prouce Area Breakowns of the Monthly National Estimates of Retail rae. Journal of the American Statistical Association, 61 (314),

22 154 M. A. Hiiroglou, V. M. Estevao: A Comparison of small... YOU, Y., RAO, J. N. K., (00). A pseuo-empirical best linear unbiase preiction approach to small area estimation using survey weights, Canaian Journal of Statistics, 30, VERRE, F., RAO, J. N. K., VERRE, F., RAO, J. N. K., HIDIROGLOU, M. A., (015). Moel-base small area estimation uner informative sampling. o appear in the December 015 issue of Survey Methoology.

The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas

The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas The Role of Moels in Moel-Assiste an Moel- Depenent Estimation for Domains an Small Areas Risto Lehtonen University of Helsini Mio Myrsylä University of Pennsylvania Carl-Eri Särnal University of Montreal