Survey-weighted Unit-Level Small Area Estimation

Survey-weighte Unit-Level Small Area Estimation Jan Pablo Burgar an Patricia Dörr Abstract For evience-base regional policy making, geographically ifferentiate estimates of socio-economic inicators are the basis. However, national surveys are often conucte uner a complex sampling esign ue to iverse reasons. Often small sample sizes result within regions of interest leaing to too inefficient classical esign-base estimators for policy making. In this case, the methoology of small area estimation (SAE) is applicable. Classical SAE relies on the assumption of a multi-level regression moel unerlying the population ata an presumes the sample esign to be non-informative. These assumptions are har to verify in practice. Uner an informative sample esign, estimate regression parameters are biase an the moel-consistency of SAE gets lost. We correct for the sample informativeness in the parameter estimates, an construct esign- an moel-consistent estimates for regional inicators. Besies the estimation proceure we also propose a MSE estimator. In a simulation stuy, we illustrate the necessity of survey weights uner the violation of typical SAE assumptions. Furthermore, we show that the propose metho is also applicable to generalize linear mixe moel settings, allowing also for non-continuous epenent variables. Key wors: Small area estimation, generalize linear mixe moels, survey-weighting Jan Pablo Burgar Research Institute for Official an Survey Statistics (RIFOSS), Trier University, Universitätsring 15, D-54295 Trier, e-mail: burgarj@uni-trier.e Patricia Dörr Research Institute for Official an Survey Statistics (RIFOSS), Trier University, Universitätsring 15, D-54295 Trier e-mail: oerr@uni-trier.e 1

2 Jan Pablo Burgar an Patricia Dörr 1 Introuction National Statistical Institutes often conuct surveys using a complex survey esign, either ue to costs or ue to optimality consierations at the national level. This may lea to small sample sizes in certain geographic regions for which an estimate can be of interest, though. Small sample sizes lea to high variances of the classical esign-base estimators an lea to the traional set-up of the small area estimation (SAE) framework. In SAE borrowing strength across small omains an thus an increase efficiency is attaine using a regression moel. The inclusion of a ranom effects term - whose realization is area-specific - the regression moel is calle mixe. SAEs are usually composite of such ranom effects preictions an the realize sample s estimate. However, when the moel - that is estimate on the realize sample - oes not correspon to the population moel for some reason, these proceure returns biase estimates. Complex survey esigns or moel misspecification may contribute to such non-corresponence between the sample an population moel. In general, survey weights contain information about the sampling esign an thus, their inclusion in the regression moel component can reuce possible moel bias. We consier the case where the mixe moel is estimate on the sampling unit, i.e. unit-level SAE in the sense of [2]. Usually, estimation is one using the sample log-likelihoo, which requires integration over unit-likelihoos in a given area such that unit-specific weighting is not straight forwar (cf, for example [13]). Existing proposals require units sharing one ranom effect to have the same weight, because the likelihoo is expresse as a neste integral an elements within an integral cannot be weighte ifferently. This implies that the ranom effects structure reflects the sampling clustering an thus are neste [12], [13]. This is quite restrictive an furthermore, access to sampling stage specific inclusion probabilities is selom for final ata users. Therefore, a more general estimation proceure that allows for crosse an neste ranom effects an that requires only final survey weights is neee. The Expectation-Maximization (EM) methoology ([5]) that is applicable to mixe effects moels in general ([7]) provies a framework that is applicable to these nees. However, as one coul also think of epenent variables stemming from other exponential family istributions such as binary or count ata, we employ a Monte-Carlo version (Monte-Carlo EM algorithm, MCEM) that replaces the E-step by a Monte-Carlo approximation ([10], [3]). The specific survey-weighting application is outline in [4]. Section 2 introuces the algorithm an turns on problems of the MC-integration. Section 3 hanles consistency consierations an Mean Square Error (MSE) estimation. Afterwars, a simulation stuy emonstrates the possible gains of the survey-weighte SAE estimator. The final section iscusses possible further research an conclues.

Survey-weighte Unit-Level Small Area Estimation 3 2 Propose Estimators 2.1 Likelihoo Set-up We use the Monte-Carlo EM-algorithm aapte to survey-weighte Generalize Linear Mixe Moels (GLMMs) as propose in [4]. Here, we give a brief review of the set-up from which the SAE point estimator result. Consier as ata generating process (DGP) a GLMM escribe through η i = x T i β + zt i γ (1) µ i = g(η i ) (2) Y i F(µ i,ϕ) (3) G N(0,Σ), (4) where γ is a realization of the mulitvariate normal ranom variable G (with normal ensity function φ( σ)), X = (x 1,...,x N ) T an Z = (z 1,...,z N ) T being matrices of explanatory variables in R N p an R N q respectively, β a superpopulation parameter vector of fixe effects an F a istribution from the exponential family. Consequently, g enotes the inverse link function between expectation µ i an linear preictor η i. We consier here the canonical link functions uner which log f (y i,η i ) = y iη i b(η i ) a(ϕ) + c(y i,ϕ) (5) is the logarithmic ensity function. The covariance matrix Σ only epens on a parameter vector σ of length q. Define the vector of moel parameters by ψ = (β T,σ T ) T. Let U, U = N be an inex set an for i U, let y i be a realization of Y i. Then, the population log-likelihoo is L L (y,ψ,γ) = log f (y i γ,β) + logφ(γ σ), (6) i U an if the ranom effect vector γ was known, a natural survey-weighte version of (6) woul be the Horvitz-Thompson estimator for totals ([6]). However, this is not the case. But a γ simulate from the correct istribution may serve as a plugin an then (6) can be estimate through a HT-estimator. Repeating the simulation an averaging yiels then the (moel) expectation of the HT-estimator (enote by E(L L S )), that is maximize in the M-step. Computational an conceptual ifficulties that must be taken into account with that proceure are iscusse in [4]. As the EM-algorithm applie on the expectation of the estiman of (6) converges to a salepoint of the likelihoo an the latter is the weighte sum of strictly convex unit-likelihoo: The first summan is a generalize linear moel component, whose maximum likelihoo (ML) estimator is unique for the canonical link (cf. [16]) an the secon summan in (6) is a normal log-ensity, whose ML is unique, too. The propose proceure thus converges to the maximum likelihoo (ML) estimator of

4 Jan Pablo Burgar an Patricia Dörr the population likelihoo if the survey-weights reflect the selection process into the sample. In contrast, an unweighte sample log-likelihoo rather converges to the ML of the sample s ata generating process, compose of the population DGP an the sample ranomization. Thus, the survey weighting may protect against some violations of the SAE assumptions (cf. [14]): A non-ignorable sample esign. 2.2 SAE Estimator Having the estimates of the moel components, ˆψ, we can efine our estimator, a weighte unit-level estimator (wul) of the finite population mean of variable Y. For the pairwise isjoint subpopulations U, U = N an U = D=1 U, we propose three alternative estimators for the finite population mean in omain, ȳ = N 1 i U y i. The notation ȳ is chosen in orer to ifferentiate between the expectation uner the superpopulation moel, µ, an the finite population realization, which is the focus here. The first alternative is ˆµ wul = N 1 i U ˆµ i, (7) which is similar to [14, eq 5.3.7] in the linear case an [14, eq 9.4.20] for binary ata. This version oes not inclue any finite population correction, which however, might be negligible in a small area setting where the sampling fractions are rather small. Another version that incorporates the sample observations y i is ( ) ˆµ wul = N 1 ˆµ i + (y i ˆµ i ) i U i S U. (8) In both equations, ˆµ i is the preiction of iniviual i s variable Y i expectation uner the estimate vector ˆψ an the moe ˆγ of the weighte sample likelihoo uner ˆψ ˆµ i = g(x T i ˆβ + z T i ˆγ), (9) where the ranom effects only can be preicte for those areas that have at least one observation in the sample. In the LMM setting, (8) is a generalization of the classical unit-level estimator introuce in [2], which only incorporates a ranom intercept an oes not inclue any aitional survey information. That means, in [2], the survey weights are consiere to be equal across areas an units. Another option for SAE woul be a moel-assiste version ( ) ˆµ wul = N 1 ˆµ i + w i (y i ˆµ i ) i U i S U. (10) Estimator (10) is similar to the Generalize Linear Regression Estimator propose in [15] an [8] in the logistic regression setting. However, note that the version (10)

Survey-weighte Unit-Level Small Area Estimation 5 is base on a GLMM in lieu of GLM an incorporates area-specific information through the ranom effect preiction ˆγ, which makes this moel-assiste estimator especially applicable to SAE settings. 2.3 MSE Estimation As our propose estimators are smooth functions of ψ an γ, asymptotic MSE estimation fits into the framework of [11] who even eal with non-smooth SAE estimators for poverty analysis. Therefore, uner the (common) regularity conitions given in [11] (that we also partially assume in the previous sections in orer to establish esign-consistency of the point estimators), an asymptotic MSE of (10) consists of the esign variance of the moel resiuals in the omain uner consieration. For the moel-base estimators (7) an (8), however, MSE estimation is more ifficult. We suggest a first-orer Taylor approximation of the preictions ˆµ i at the population parameters (β pop,γ ) T - γ being the population likelihoo moe an then calculating the variance of the linearize preictions. This requires variance estimators for the fixe effects estimates an the ranom effect preictions. [9] gives a formula on how to approximate the Hessian of the observe ata matrix in the EMalgorithm an its application for the fixe effects parameter is also propose in [3]. A lower boun of the preiction error of ˆγ, on the other han, is the inverse Hessian of the log-likelihoo evaluate at the ML-estimates of ˆβ. An approximation of the inverse Hessian is reaily available from the specific MCEM-estimation algorithm iscusse in [4]. However, note that this suggestion is only a lower boun for MSE estimation an hol only asymptotically. 3 Simulation Stuy We present a small simulation stuy in orer to emonstrate the necessity of surveyweighting when the non-informativeness assumption is violate. We therefore generate a fixe population U, U = 3000 uner the following superpopulation moel where the population is mae up of 50 pairwise isjoint omains U = 50 =1 U, U = 60, an X 1 N(6,3 2 ) an X 2 Exp(3). The DGP is η i = 30 3x 1,i 8x 2,i + γ, i U (11) γ N(0,2 2 ) (12) µ i = g(η i ), g {i,logit 1 } (13) {N ( µ i,(2.3) 2) Y i. (14) Ber(µ i )

6 Jan Pablo Burgar an Patricia Dörr We raw B = 1500 (equally allocate) stratifie samples S of size n = 200 of a finite population generate in this way. Consequently, S U = S an S = n = 4. This is a relatively easy set-up an if the sampling esign is non-informative, the estimation conitions for the BHF ([2]) are optimal. We contrast a πps esign (which is uner the correct moel specification noninformative) where the inclusion probability π i for a unit in omain equals { } x 2,i π i = max n,1 (15) j U x 2, j an an informative esign where the inclusion probability of unit i in omain is calculate in three steps: e i = y i µ i (16) 0.1 if e i is below the 0.25 quantile q i = 0.2 if e i is between the 0.25 an 0.5 quantile (17) 0.4 if e i is above the 0.5 quantile { } q i π i = max n,1. (18) j U q j As observations with a bigger resiual ten thus to be oversample, the intercept estimator of the unweighte moel estimation is expecte to be overestimate which in return shoul yiel biase preictions. We compare the propose estimator (8) (that has conceptually the closest similarity to the traitional BHF) to the Generalize Regression estimator (GREG), the BHF estimator an another survey-weighte SAE estimator, the You-Rao estimator (YR, cf. [17]). In the case of the binary outcome, we consier the logistic regression estimator (LGREG, cf. [8]), too, an the BHF is estimate like (8), but the unerlying regression is a generalize linear mixe moel estimate with the R-Package lme4 ([1]). Due to construction, the GREG an the YR estimate a linear moel for the binary outcome, too. The quality criteria that we assess are the relativeempirical bias an the relative empirical mean square error (MSE) of an estimator ˆµ over all omains: relbias = 1 50 50 =1 1 1500 1500 b=1 ˆµ,b µ µ (19) relmse = 1 50 50 =1 1 1500 1500 b=1 ( ˆµ,b µ µ ) 2 (20) Results are liste in table (1). The results uner the non-informative esign an the gaussian outcome variable are stanar: Survey-weights are not neee for consistent estimation an inflate the estimators. Consequently, the BHF is the most efficient estimator with respect to relative mean square error. However, we note that the loss of efficiency when applying survey weights is low an thus might be recom-

Survey-weighte Unit-Level Small Area Estimation 7 mene as the moel assumptions are usually not verifiable. Furthermore, we fin that the propose estimator wml can compete with YR. Uner a binary variable of interest, we fin that results are similar to the linear mixe moel case an both GREG an YR perform well though they employ a linear regression moel. Nonetheless, the LGREG is the less biase estimator an has a lower relative MSE than the GREG. Table 1 Simulation Results Sampling esign Outcome Variable Estimator relbias relmse GREG 0.00457 0.10959 Normal wml 0.00974 0.0248 YR 0.0212 0.02504 Non-informative BHF 0.01264 0.01759 GREG 0.00466 0.07273 LGREG 0.00242 0.02196 Binary wml 0.00682 0.00456 YR 0.01076 0.00752 BHF 0.00647 0.00454 GREG 0.00312 0.04289 Normal wml 0.04555 0.02394 YR 0.05136 0.02402 Informative BHF 0.12068 0.03189 GREG 0.00594 0.04658 LGREG 0.00742 0.02256 Binary wml 0.02744 0.00564 YR 0.05692 0.01108 BHF 0.06463 0.01033 Uner the informative sampling esign, results change remarkably, though. As expecte, the unweighte traitional BHF has an unacceptable high relative bias, although the relative MSE may compete with the GREG. In the continuous epenent variable case, wml an YR return comparable results. But when the epenent variable is binary, the suggeste metho outperforms BHF ue to the inclusion of survey weights an the YR ue to the GLMM framework employe. Thus, wml gives any of the four presente cases a goo balance between bias an variance. 4 Conclusion In this paper, we propose the use of a GLMM estimation framework that allows the inclusion of unit-specific survey weights in the estimation process in orer to protect unit-level base SAE estimators against sampling informativeness. A linearization an/ or resiual base MSE estimation of the suggeste estimators is iscusse. Finally, a simulation stuy emonstrates the necessity of survey weights in the moel estimation step in orer to reuce the SAE estimators bias, both in a continuous variable set-up as well as for a binary variable of interest. We fin that the loss in ef-

8 Jan Pablo Burgar an Patricia Dörr ficiency when survey weights nee not be inclue in the moel estimation but are nonetheless, is marginal. In contrast there are important gains when the sampling is informative. To conclue, we woul like to note that the informativeness of the survey esign is har to verify in real worl application an may also epen on the analyze variable. Thus, we highly recommen the inclusion of survey weights in SAE analysis. References 1. Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixe-effects moels using lme4. arxiv preprint arxiv:1406.5823 (2014) 2. Battese, G. E., Harter, R. M., Fuller, W. A.: An error-components moel for preiction of county crop areas using survey an satellite ata. J. Am. Stat. Assoc. 83(401), 28 36 (1988) 3. Booth, J. G., Hobert, J. P.: Maximizing generalize linear mixe moel likelihoos with an automate Monte Carlo EM algorithm. J. Royal Stat. Soc., Series B (Methoological), 61(1), 265 285 (1999) 4. Burgar, J.P., Dörr, P.: Survey-weighte Generalize Linear Mixe Moels. Research Papers in Economics 2018-01, University of Trier, Department of Economics (2018) 5. Dempster, A. P., Lair, N. M., Rubin, D. B.: Maximum likelihoo from incomplete ata via the EM algorithm. J. Royal Stat. Soc., Series B (Methoological), 1 38 (1977) 6. Horvitz, D. G., Thompson, D. J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc., 47(260), 663 685 (1952) 7. Lair, N. M., Ware, J. H.: Ranom-effects moels for longituinal ata. Biometrics, 963 974 (1982) 8. Lehtonen, R., Veijanen, A.: Logistic generalize regression estimators. Surv. Methool., 24, 51 56 (1998) 9. Louis, T. A.: Fining the observe information matrix when using the EM algorithm. J. Royal Stat. Soc., Series B (Methoological), 226 233 (1982) 10. McCulloch, C. E.: Maximum likelihoo algorithms for generalize linear mixe moels. J. Am. Stat. Assoc., 92(437), 162 170 (1997) 11. Morales, D., el Mar Ruea, M., Esteban, D.: Moel-Assiste Estimation of Small Area Poverty Measures: An Application within the Valencia Region in Spain. Soc. Inic. Res., 1 28 (2017) 12. Pfeffermann, D., Skinner, C. J., Holmes, D. J., Golstein, H., Rasbash, J.: Weighting for unequal selection probabilities in multilevel moels. J. Royal Stat. Soc., Series B (Methoological), 60(1), 34 40 (1998) 13. Rabe Hesketh, S., Skronal, A.: Multilevel moelling of complex survey ata. J. Royal Stat. Soc., Series A (Statistics in Society), 169(4), 805 827 (2006) 14. Rao, J. N. K.: Small Area Estimation. Wiley, New York (2003) 15. Ronon, L. M., Vanegas, L. H., Ferraz, C.: Finite population estimation uner generalize linear moel assistance. Comput. Stat. Data Anal., 56(3), 680 697 (2012) 16. Weerburn, R. W. M. On the existence an uniqueness of the maximum likelihoo estimates for certain generalize linear moels. Biometrika, 63(1), 27 32 (1976) 17. You, Y., Rao, J. N. K.: A pseuoempirical best linear unbiase preiction approach to small area estimation using survey weights. Can. J. Stat., 30(3), 431 439 (2002)