A note on regression estimation with unknown population size

Statstcs Publcatons Statstcs 6-016 A note on regresson estmaton wth unknown populaton sze Mchael A. Hdroglou Statstcs Canada Jae Kwang Km Iowa State Unversty jkm@astate.edu Chrstan Olver Nambeu Statstcs Canada Follow ths and addtonal works at: http://lb.dr.astate.edu/stat_las_pubs Part of the Desgn of Experments and Sample Surveys Commons Multvarate Analyss Commons and the Statstcal Methodology Commons The complete bblographc nformaton for ths tem can be found at http://lb.dr.astate.edu/ stat_las_pubs/1. For nformaton on how to cte ths tem please vst http://lb.dr.astate.edu/ howtocte.html. Ths Artcle s brought to you for free and open access by the Statstcs at Iowa State Unversty Dgtal Repostory. It has been accepted for ncluson n Statstcs Publcatons by an authorzed admnstrator of Iowa State Unversty Dgtal Repostory. For more nformaton please contact dgrep@astate.edu.

Survey Methodology June 016 11 Vol. 4 No. 1 pp. 11-135 Statstcs Canada Catalogue No. 1-001-X A note on regresson estmaton wth unknown populaton sze Mchael A. Hdroglou Jae Kwang Km and Chrstan Olver Nambeu 1 Abstract The regresson estmator s extensvely used n practce because t can mprove the relablty of the estmated parameters of nterest such as means or totals. It uses control totals of varables known at the populaton level that are ncluded n the regresson set up. In ths paper we nvestgate the propertes of the regresson estmator that uses control totals estmated from the sample as well as those known at the populaton level. Ths estmator s compared to the regresson estmators that strctly use the known totals both theoretcally and va a smulaton study. Key Words: Optmal estmator; Survey samplng; Weghtng. 1 Introducton Regresson estmaton has been ncreasngly used n large survey organzatons as a means to mprove the relablty of the estmators of parameters of nterest (such as totals or means) when auxlary varables are avalable n the populaton. A comprehensve overvew of the regresson estmator n survey samplng can be found n Cassel Särndal and Wretman (1976) and Fuller (009) among others. We next llustrate how the regresson estmator can be used to estmate the total = y U U = 1 N denotes the target populaton. A sample s of expected sze n s selected accordng to a samplng plan p s from U s the resultng probablty of ncluson of the frst order. In the absence of auxlary varables we use the Horvtz-Thompson estmator gven by = d s y (Horvtz and Thompson 195) d =1 s referred to as the weght survey assocated wth unt. The regresson estmator s gven by REG = X X B (1.1) X x X = x = d U s x = 1 x xp and B s a p dmensonal vector of estmated regresson coeffcents whch s computed as a functon of the observed varables y x n the sample s. Note that the components of the vector of populaton total X are known for each of the correspondng components varables n the vector x = 1 x xp used to compute B. However there are nstances when we have more observed auxlary varables n the sample than n the populaton. Assume that the sample has q observed varables q > p and that the p varables n the populaton are a subset of the q varables observed n the sample. Furthermore suppose that some of the extra q p varables n the sample are well correlated wth the varable of nterest y. Can these extra varables be ncorporated n the 1. Mchael A. Hdroglou Busness Survey Methods Dvson Statstcs Canada ON Canada K1A 0T6. E-mal: hdrog@yahoo.ca; Jae Kwang Km Department of Statstcs Iowa State Unversty Ames IA 50011. E-mal: jkm@astate.edu; Chrstan Olver Nambeu Busness Survey Methods Dvson Statstcs Canada ON Canada K1A 0T6. E-mal: chrstanolver.nambeu@canada.ca.

1 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze regresson estmator so as to make t more effcent? Sngh and Raghunath (011) attempted to respond to that queston for the case q = p 1. Ther extra varable n the sample was the ntercept. They used t to estmate the unknown populaton sze N by N = d. In ths artcle we compare the estmator proposed by Sngh and Raghunath (011) to other regresson estmators when N s known or unknown. In Secton we descrbe standard regresson estmators for estmatng totals when N s known as well as the regresson proposed by Sngh and Raghunath (011) when N s unknown. In Secton 3 an alternatve estmator s proposed for the case N s unknown. A smulaton study s carred out n Secton 4 to llustrate the performance of the varous estmators studed n terms of bas and mean square error. Overall conclusons and recommendatons are gven n Secton 5. s Regresson estmators Under general regularty condtons (Isak and Fuller 198; Montanar 1987) an approxmaton to the regresson estmator (1.1) s REG = X X B (.1) B s the lmt n probablty of B when both the sample and the populaton szes tend to nfnty. For large samples the varance of regresson estmator (1.1) can be studed va (.1). Note that REG s unbased under the samplng plan p s and can be re-expressed as: = XB REG de (.) E = y xb. The desgn varance for REG can be approxmated by s E E j AV p REG = j (.3) U ju j j = j j and j s the second order ncluson probablty for unts and j. Both the modelasssted (Särndal Swensson and Wretman 199) and the optmal-varance (Montanar 1987) approaches can be used to estmate B. They both yeld approxmately unbased estmators. In the case of the modelasssted approach the basc propertes (bas and varance terms) are vald even when the model s not correctly specfed. Under the optmal-varance approach no assumpton s made on the varable of nterest. The model-asssted estmator of Särndal et al. (199) assumes a workng model between the varable of nterest y and the auxlary varables x. The workng model s denoted by m : y = x β β s a vector of p unknown parameters Em x =0 Vm x = and Cov m j x x j = 0 j. Under ths approach B n equaton (.1) s the ordnary least squares estmator of β n the populaton and t s gven by 1 B = x x x GREG c c y U U (.4) Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 13 c =. Ths yelds the followng estmator for the total GREG = X X B (.5) GREG 1 = B. GREG cd xx cd xy s s (.6) The optmal estmator of Montanar (1987) obtaned by mnmzng the desgn varance of REG = X X B s = X X B (.7) V 1 B = X Cov X 1 x x j x y j = j j. U ju j U ju j (.8) The optmal estmator for the total s estmated by = X X B (.9) 1 x j x j j x y j B =. s jsj j s jsj j (.10) Note that the computaton of the regresson vectors requres that the frst component that defnes them s nvertble. We can ensure ths by reducng the number of auxlary varables that are nput nto the regresson f not much loss n effcency of the resultng regresson estmator s ncurred. If on the other hand there s a sgnfcant loss n effcency then we can nvert these sngular matrces usng generalsed nverses. As mentoned n the ntroducton not all populaton totals may be known for each component of the auxlary vector x. The regresson normally uses the auxlary varables for whch a correspondng populaton total s known. Decomposng x as 1 x x = x xp Sngh and Raghunath (011) proposed a GREG-lke estmator that assumes that the regresson s based on an ntercept and the varable x even though only the populaton total of the x s known. For the case that N s not known and that the populaton total of x s known ther estmator s Statstcs Canada Catalogue No. 1-001-X

14 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze X = U obtaned from GREG 1GREG GREG X X BGREG = (.11) x and X = d. s x The regresson vector of estmated coeffcents B GREG s = B B B gven by (.6). The approxmate desgn varance for takes the same form as equaton (.3) wth and N U X = x N. E = y GREG x B B c c y x X x X x X = GREG N N N U U The propertes of (.11) can be obtaned by notng that = = GREG GREG p 1 Snce O n 1 X X BGREG X X BGREG X X BGREG BGREG =. B B under some regularty condtons dscussed n Fuller (009 Chapter ) the last term s of smaller order. Thus gnorng the smaller order terms we get the followng approxmaton d E E (.1) s U E = y x B GREG. Thus s approxmately desgn-unbased. The asymptotc varance can be computed usng V de E = E de E. s U s U As we can see the asymptotc varance can be qute large unless = 0. Remark.1 If y = a bx E U we have = N N a and ths mples that V av N. Ths means that f V N >0 we can artfcally ncreases by choosng large values of a. = p Note that the optmal regresson estmator usng x x unbased because = = av N the varance of x s also approxmately desgn X X B X X B X X B B = 1 B s obtaned by replacng x by B = B O n under p some regularty condtons dscussed n Fuller (009 Chapter ) gnorng the smaller order terms we get x n equaton (.8). Snce Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 15. X X B The asymptotc varance of s smaller than the one assocated wth. The reason for ths s that the optmal estmator mnmzes the asymptotc varance among the class of estmators of the form ndexed by B. = X X B B (.13) 3 Alternatve regresson estmator We now consder an alternatve estmator that does not use the populaton sze N nformaton. Rather t uses the known ncluson probabltes provded that they are known for each unt n the populaton. U we can use = Gven that = n nd e z x as auxlary data n the model y = z e 0. Ths means that the ncorporaton of the varance structure c of the error n the regresson vector s gven by c = d. The resultng estmator s gven by U d s wth Z = z Z = z and = Z Z B (3.1) 1 B = cd zz cd zy. s s (3.) Ths estmator corresponds exactly to the one gven by Isak and Fuller (198). Remark 3.1 By constructon Snce s a component of d y zb z 0 s =. d y zb =0 ths leads to s z we have = ZB. Thus s the best lnear unbased predctor of = N y =1 under the model e 0. y = x β e 1 Statstcs Canada Catalogue No. 1-001-X

16 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze Note that B can be expressed as B GREG by settng c = d and x = z. Thus the proposed regresson estmator can be vewed as a specal case of GREG estmator. Usng the argument smlar to (.1) we obtan (3.3) s U d E E E = y zb and 1 B =. c c y z z U z U The proposed estmator s approxmately unbased and ts asymptotc varance j zb = j s U ju j V d y s often smaller than the asymptotc varance of Sngh and Raghunath (011) s estmator. The optmal verson of uses = z x as auxlary data. It s gven by K K E = Z Z B (3.4) E B K s obtaned by substtutng x by z n equaton (.10). In ths case the optmal B Z Z cannot be computed because the varance- Remark 3. For fxed-sze samplng desgns we have Vp d =0. s regresson coeffcent vector 1 K = V Cov p p covarance matrx Vp Z s not nvertble. Thus the optmal estmator wth = the optmal estmator (.9) only usng Remark 3.3 For random-sze samplng desgns Vp d 0. s = x. z x reduces to In ths case all of the components of z x can be used n the desgn-optmal regresson estmator (.9). A dffculty wth usng the optmal estmator K s that t requres the computaton of the jont ncluson probabltes j: these may be dffcult to compute for certan samplng desgns. An estmator that does not requre the computaton of the jont ncluson probabltes s obtaned by assumng that j = j. We refer to ths estmator as the pseudo-optmal estmator P. It s gven by P = Z Z B (3.5) P 1 B P = cd cd y z z s z s Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 17 and c = d 1. In general the pseudo-optmal estmator P should yeld estmates that are qute close to those produced by when the samplng fracton s small. Note that P s exactly equal to the optmal estmator K n the case of Posson samplng. In ths samplng desgn the ncluson probabltes of unts n the sample are ndependent. The approxmate desgn varance for K and P have the same form as the one gven n equaton (.3) wth the E s respectvely gven by y zb y zb K and zb. y P 4 Smulatons We carred out two smulaton studes. The frst one used a dataset provded n the textbook of Rosner (006) and the second one was based on an artfcal populaton created accordng to a smple lnear regresson model. The frst smulaton assessed the performance of all of the estmators wth respect to dfferent sample schemes whle the second smulaton study focused on the mpact of changng the ntercept value n the model. The parameter of nterest for these two smulatons s the total of the varable of nterest y : = y. All estmators were used U GREG P and K wth the avalable auxlary data. Table 4.1 summarzes the auxlary data and the varance structure of the errors (when applcable) assocated wth the estmators used n the two studes. Table 4.1 Estmators used n smulaton N known N unknown GREG as defned by (.5) wth x = 1 x and c = c 1 as defned as specal case of (.11) wth x = x as defned by (.9) wth = 1 x as defned by (.9) wth = 1 x x 1 x as defned by (.9) wth x = x as defned by (3.1) wth z = x and c = d P as defned by (3.5) wth z = 1 x and c = d 1 K as defned as (3.4) wth z = x P as defned as (3.5) wth z = x and c = d 1 The performance of all estmators was evaluated based on the relatve bas the Monte Carlo relatve effcency and the approxmate relatve effcency. Expressons of these quanttes as shown below. 1. Relatve bas: Statstcs Canada Catalogue No. 1-001-X

18 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze ESTr 100 RB EST = R R ESTr (4.1) =1 represents one of the estmators presented n Table 4.1 as computed n the r th Monte Carlo sample.. Monte Carlo Relatve effcency MC GREG MSEMC EST RE EST = MSE (4.) R 1 MSE =. r MC EST EST R r =1 The RE measures the relatve effcency of the estmator EST wth respect to GREG. 3. Approxmate Relatve effcency GREG AV p EST AR EST = AV p (4.3) E E AV j p EST = j U ju s the approxmate varance of EST wth E = y xb EST. The approxmate relatve effcency AR measures the relatve gan n effcency of EST wth respect to GREG usng the populaton resdual obtaned by Taylor lnearsaton. It s expected that RE and AR gve comparable results. However as we wll see ths may not be the case. 4.1 Smulaton 1 The populaton was the dataset (FEV.DAT) avalable on the CD that accompanes the textbook by Rosner (006). The data fle contans 654 records from a study on Chldhood Respratory Dsease carred out n Boston. The varables n the fle were: age heght sex (male female) smokng (ndcates whether the ndvdual smokes or not) and Forced expratory volume (FEV). Sngh and Raghunath (011) used the same data set. The parameter of nterest s the total heght y of the populaton. The varable age x 1 was used as auxlary varable n the regresson. The varable FEV x was chosen as the sze varable to compute probabltes of selecton for the samplng schemes that are consdered n ths smulaton. The two varables sex and smokng were dscarded from the smulaton. Table 4. summarzes the central tendency measures of the three varables n the populaton. For each varable the mean and medan were smlar. Ths ndcates that the three varables have a symmetrcal dstrbuton. Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 19 Table 4. Descrptve statstcs of y x 1 and x Mn Q1 Medan Mean Q3 Max y 46 57 61.5 61.14 65.5 74 x 1 3 8 10 9.931 1 19 x 0.79 1.98.55.64 3.1 5.79 Fgure 4.1 dsplays the relatonshp between the varable of nterest y and the auxlary varable x. The 1 relatonshp between Heght y and the age x appears to be lnear but does not go through the orgn. The Pearson correlaton coeffcent between y and x 1 was 0.79. 1 Heght 45 50 55 60 65 70 75 5 10 15 Age Fgure 4.1 Relatonshp between the varable of nterest Heght and the auxlary varable Age. The objectve of ths smulaton study was to evaluate the performance of the estmators presented n Table 4.1 usng dfferent samplng desgns. We consdered the Mdzuno the Sampford and the Posson samplng desgns. The varable x were used as a sze measure for the three samplng schemes to compute the ncluson probabltes. These samplng desgns are as follows: 1. Mdzuno samplng (see Mdzuno 195): The frst unt s sampled wth probablty p and the remanng n 1 unts are selected as a smple random samplng wthout replacement from the remanng N 1 remanng unts n the populaton. The probabltes of selecton p for unt Statstcs Canada Catalogue No. 1-001-X

130 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze s gven by p = x x. The frst order ncluson probablty for unt s gven by U 1 = N 1 N n p n 1.. Sampford samplng (see Sampford 1967): The algorthm for selectng the sample s carred out as follows. The frst unt s selected wth probablty p = x x U and the remanng n 1 unts are selected wth replacement wth probablty = 1 np 1 p. If any of the unts are selected more than once the procedure s repeated untl all elements of the sample are dfferent. The probablty of ncluson of the frst order s gven by = np. 3. Posson samplng: Each unt s selected ndependently resultng n a random sample sze. The probablty of selectng unt s p = x x. The ncluson probablty assocated wth U unt s = np. A good descrpton of ths procedure can be found n Särndal et al. (199). The total of = y U was the parameter of nterest. Based on each of these samplng schemes we selected R = 000 Monte Carlo samples of sze n = 50. Estmators n Table 4.1 were then computed for each sample. The performance of the estmators was then assessed usng the Relatve Bas the Monte Carlo Relatve Effcency and the Approxmate Relatve Effcency as descrbed by the equatons (4.1) (4.) and (4.3) respectvely. 4. Smulaton 1 results Smulaton results are presented n Table 4.3. All estmators studed are approxmately unbased and ther relatve bas s smaller than 1%. We dscuss separately the approxmate relatve effcency (AR) and the relatve effcency (RE) of the estmators when the populaton sze N s known and unknown. Case 1: Populaton sze N s known We compare the AR and the RE for the followng estmators n Table 4.3: GREG and P for each of the three samplng desgns. We can do so for almost all these estmators except for for the Mdzuno and the Sampford samplng schemes. In ths case we cannot compute B for a smlar reason as the one descrbed n Remark 3.. On the bass of both AR and RE the pseudo-optmal estmator s the most relable estmator regardless of the samplng scheme. It s close to the optmal estmator only n terms of AR. Both the RE and the AR of the optmal estmator were not as close as expected under the Mdzuno samplng desgn. The poor behavour of the RE of the optmal estmator has also been observed by Montanar (1998). Fgure 4. explans what s happenng. We observe that most estmates obtaned for the optmal estmator for the 000 Monte Carlo samples are close to the mean. However n some samples the estmates are qute far from t. Ths s n contrast to P the values are tghtly centered around the mean: note that the assocated RE and AR are qute close to one another. Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 131 P estmators 0000 30000 40000 50000 P estmators 0000 30000 40000 50000 0 500 1500 0 500 1500 Replcates Replcates Fgure 4. Scatter plots of Monte Carlo estmators under the Mdzuno Samplng Desgn. The optmal estmator s equvalent to the pseudo-optmal estmator P n the case of Posson samplng scheme. Recall that the optmal estmator used x = 1 x as auxlary data. The optmal estmator used x = 1 x as auxlary data. The addton of the has sgnfcantly mproved the effcency of the optmal estmator for the Posson samplng scheme. Sngh and Raghunath (011) used 1 when N was known but dd not nclude t as a control count. Nonetheless they observed that 1 was qute comparable to GREG n terms of AR and RB for the Mdzuno samplng desgn. The reason for ths s that ths samplng scheme s qute close to smple random samplng wthout replacement. However usng these two measures 1 s by far the worst estmator for the other two samplng schemes. Case : Populaton sze N s unknown Fve estmators are reported n Table 4.3 for ths case. However as s qute close to K and P we comment on the results obtaned for 1 1 and. Estmators 1 1 and were very smlar n terms of relatve effcency and approxmate relatve effcency for the Mdzuno samplng desgn. For the Sampford samplng scheme 1 and P were comparable and slghtly better than 1. Under the Posson samplng scheme 1 and outperformed 1. We can also see that 1 was very neffcent wth an RE at least 10 tmes larger than those assocated wth or P. Note that was better than 1 : ths s reasonable as uses two auxlary varables as 1 uses the sngle auxlary varable x. Statstcs Canada Catalogue No. 1-001-X

13 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze Table 4.3 Comparson of estmators n terms of relatve bas and relatve effcences Populaton sze known GREG Populaton sze unknown Ŷ Ŷ P 1 1 K P Mdzuno RB (n %) 0.08 0.04 0.07 0.07 0.07 0.07 0.07 RE 1.00 5.84 0.54 0.94 0.93 0.93 0.93 AR 1.00 0.55 0.55 0.94 0.93 0.93 0.93 Sampford RB (n %) 0.11 0.11 0.07-0.01 0.07 0.0 0.0 RE 1.00 0.59 0.58 14.7 13.69 13.55 13.56 AR 1.00 0.55 0.56 15.77 14.39 14.39 14.40 Posson RB (n %) 0.11 0.11 0.08 0.08 0.09 0.14 0.16 0.16 0.16 RE 1.00 0.96 0.57 0.57 160.47 15.49 13.85 13.85 13.85 AR 1.00 0.96 0.55 0.56 180.36 16.73 14.40 14.39 15.73 Note: We do not provde results for and K for the Mdzuno and Sampford desgns because the varance-covarance matrx s not nvertble. 4.3 Smulaton The performance of the estmators was assessed for dfferent values of the ntercept n the model. We restrcted ourselves to the Posson samplng desgn to llustrate Remark.1 n Secton : that s the effcency of deterorates as the ntercept gets bgger. The populaton was generated accordng to the followng model y = a x e. (4.4) The e values were generated from a normal dstrbuton wth mean 0 and varance =1. The x values were generated accordng to a ch-square dstrbuton wth one degree of freedom. Three populatons of sze N 5000 were generated usng (4.4) wth dfferent values of the ntercept a. Note that x values were re-generated for each populaton. The three populatons were labelled as A B and C dependng on the ntercept used. The ntercept values were set to 3 5 and 10 respectvely for populatons A B and C. From each of these populatons we drew R = 000 Monte Carlo samples wth expected sample sze n = 50 usng the Posson samplng desgn. The frst ncluson probablty was set equal to = nz z U for each unt. The z values were generated accordng to the followng model z =0.5 y u u was a random error generated accordng to an exponental dstrbuton wth mean k equals to 0.5 or 1. 4.4 Smulaton results Numercal results are gven n Table 4.4 for k = 1 and Table 4.5 for k = 0.5. All estmators are approxmately unbased wth relatve bases smaller than 1%. Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 133 Case 1: Populaton sze N s known As expected both optmal estmators and are more effcent than GREG. The optmal estmator based on 1 x s slghtly better than GREG. The ncluson of the addtonal varable resultng n yelds sgnfcant gans n terms of RE and AR : these gans decrease as the ntercept gets larger. Once more 1 s qute neffcent and as noted n Remark.1 ths neffcency ncreases as the ntercept gets larger. The prevous observatons are vald regardless of k. The effcency of both optmal estmators and decreases as k gets smaller. Case : Populaton sze N unknown The most effcent estmator s. It outperforms 1 as t uses more auxlary varables. Estmator 1 s by far the most neffcent one. As the ntercept n the populaton model ncreases the relatve effcency (both n terms of RE and AR s farly stable for. On the other hand the relatve effcences assocated wth 1 and 1 deterorate rapdly as the ntercept n the populaton model ncreases. The effect of k on the effcences of the estmators s as descrbed when the populaton sze s known. Table 4.4 Relatve bas and relatve effcences of the estmators for k =1under Posson samplng desgn Intercept Populaton sze known Populaton sze unknown GREG Ŷ P 1 1 K P 3 RB (n %) 0.3 0.38 0.56 0.56 0.18 0.77 0. 0. 0. RE 1.00 0.95 0.67 0.67 7.7 5.4 0.94 0.94 0.94 AR 1.00 0.94 0.60 0.98 7.08 5.01 0.85 0.85 0.91 5 RB (n %) 0.04 0.07 0.18 0.18-0.01 0.67-0.07-0.07-0.07 RE 1.00 0.99 0.76 0.76 3.91 16.63 1.50 1.50 1.50 AR 1.00 0.98 0.70 0.73 3.48 16.0 1.45 1.45 1.5 10 RB (n %) -0.01-0.0 0.06 0.06-0.57 0.79-0.0-0.0-0.0 RE 1.00 1.00 0.80 0.80 88.30 67.47.0.0.0 AR 1.00 0.99 0.73 0.74 97.9 66.13.15.15.0 Table 4.5 Relatve bas and relatve effcences of the estmators for k =0.5under Posson samplng desgn Intercept Populaton sze known Populaton sze unknown GREG P 1 1 K P 3 RB (n %) 0.13 0.5 0.4 0.4-0.18 0.54-0.0-0.0-0.0 RE 1.00 0.99 0.89 0.89 8.4 5.93 1.78 1.78 1.78 AR 1.00 0.96 0.83 0.95 8.30 5.83 1.79 1.79.10 5 RB (n %) 0.03 0.09 0. 0. 0.7 1.49 0.18 0.18 0.18 RE 1.00 1.00 0.91 0.91 4.35 17.39 3.6 3.6 3.6 AR 1.00 0.98 0.88 0.94 3.83 16.41 3.15 3.15 3.54 10 RB (n %) 0.06 0.07 0.1 0.1 0.33 1.4 0.13 0.13 0.13 RE 1.00 1.00 0.96 0.96 98.69 73.93 6.6 6.6 6.6 AR 1.00 0.99 0.91 0.9 98.65 66.0 5.89 5.89 6.4 Statstcs Canada Catalogue No. 1-001-X

134 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze 5 Conclusons The regresson estmator can be qute effcent f the auxlary data that t uses are well correlated wth the varable of nterest. Furthermore t requres that populaton totals correspondng to the auxlary varables are avalable. In ths artcle we nvestgated the behavor of the regresson estmator proposed by Sngh and Raghunath (011). Ths estmator uses estmated populaton count as a control total and the known populaton totals for the auxlary varables. We compared t to the Generalzed Regresson estmator GREG ts optmal analogue and to an alternatve estmator that uses the frstorder ncluson probabltes and auxlary data for whch the populaton totals are known. As the optmal regresson estmator requres the computaton of second-order ncluson probabltes we also ncluded a pseudo-optmal estmator P that does not requre them. We nvestgated the propertes of these estmators n terms of bas and effcency va a smulaton that ncluded varous samplng desgns and dfferent values of the ntercept n the model for a generated artfcal populaton. We compared the results when the populaton sze was known and unknown. When the populaton sze s known the most effcent estmator s the optmal estmator. However snce ths estmator can be unstable the pseudo-optmal estmator P s a good alternatve to t. Ths s n lne wth Rao (1994) who favoured the optmal estmator P over the Generalzed Regresson estmator GREG. The Sngh and Raghunath (011) proposton to use s not vable as t can be qute neffcent. When the populaton sze s not known the alternatve regresson estmator s the best one to use. Acknowledgements The authors kndly acknowledge suggestons for mproved readablty provded by the Assocate Edtor and the referees. References Cassel C.M. Särndal C.-E. and Wretman J.H. (1976). Some results on generalzed dfference estmators and generalzed regresson estmaton for fnte populatons. Bometrka 63 615-60. Fuller W.A. (009). Samplng Statstcs. New ork: John Wley & Sons Inc. Horvtz D.G. and Thompson D.J. (195). A generalzaton of samplng wthout replacement from a fnte unverse. Journal of the Amercan Statstcal Assocaton 47 663-685. Isak C.T. and Fuller W.A. (198). Survey desgn under the regresson superpopulaton model. Journal of the Amercan Statstcal Assocaton 77 89-96. Mdzuno H. (195). On the samplng system wth probablty proportonal to sum of sze. Annals of the Insttute of Statstcal Mathematcs 3 99-107. Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 135 Montanar G.E. (1987). Post-samplng effcent QR-predcton n large-scale surveys. Internatonal Statstcal Revew 55 191-0. Montanar G.E. (1998). On regresson estmaton of fnte populaton means. Survey Methodology 4 1 69-77. Rao J.N.K. (1994). Estmatng totals and dstrbuton functons usng auxlary data nformaton at the estmaton stage. Journal of Offcal Statstcs 10() 153-165. Rosner B. (006). Fundamentals of Bostatstcs. Sxth edton Duxbury Press. Sampford M.R. (1967). On samplng wthout replacement wth unequal probabltes of secton. Bometrka 54 499-513. Särndal C.-E. Swensson B. and Wretman J. (199). Model Asssted Survey Samplng. New ork: Sprnger-Verlag. Sngh S. and Raghunath A. (011). On calbraton of desgn weghts. METRON Internatonal Journal of Statstcs vol. LXIX 185-05. Statstcs Canada Catalogue No. 1-001-X