A note on regression estimation with unknown population size

Similar documents
Efficient nonresponse weighting adjustment using estimated response probability

Estimation: Part 2. Chapter GREG estimation

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Statistics for Business and Economics

Economics 130. Lecture 4 Simple Linear Regression Continued

Comparison of Regression Lines

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Statistics for Economics & Business

STAT 3008 Applied Regression Analysis

Chapter 11: Simple Linear Regression and Correlation

Lecture 6: Introduction to Linear Regression

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

x = , so that calculated

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

A Comparative Study for Estimation Parameters in Panel Data Model

/ n ) are compared. The logic is: if the two

Negative Binomial Regression

Conditional and unconditional models in modelassisted estimation of finite population totals

x i1 =1 for all i (the constant ).

An (almost) unbiased estimator for the S-Gini index

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

e i is a random error

Lecture 3 Stat102, Spring 2007

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

Linear Regression Analysis: Terminology and Notation

Chapter 9: Statistical Inference and the Relationship between Two Variables

ANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE

Uncertainty as the Overlap of Alternate Conditional Distributions

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

A Robust Method for Calculating the Correlation Coefficient

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Basic Business Statistics, 10/e

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Exponential Type Product Estimator for Finite Population Mean with Information on Auxiliary Attribute

Chapter 13: Multiple Regression

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

SIMPLE LINEAR REGRESSION

Testing for seasonal unit roots in heterogeneous panels

A Monte Carlo Study for Swamy s Estimate of Random Coefficient Panel Data Model

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

Statistics MINITAB - Lab 2

Chapter 8 Indicator Variables

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

This column is a continuation of our previous column

USE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

A Bound for the Relative Bias of the Design Effect

Properties of Least Squares

Generalized Linear Methods

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

STATISTICS QUESTIONS. Step by Step Solutions.

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Statistics II Final Exam 26/6/18

Bayesian predictive Configural Frequency Analysis

Polynomial Regression Models

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

A Design Effect Measure for Calibration Weighting in Cluster Samples

Time-Varying Systems and Computations Lecture 6

REPLICATION VARIANCE ESTIMATION UNDER TWO-PHASE SAMPLING IN THE PRESENCE OF NON-RESPONSE

RELIABILITY ASSESSMENT

Population element: 1 2 N. 1.1 Sampling with Replacement: Hansen-Hurwitz Estimator(HH)

On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function

Nonparametric Regression Estimation. of Finite Population Totals. under Two-Stage Sampling

Statistics Chapter 4

Linear Approximation with Regularization and Moving Least Squares

Composite Hypotheses testing

Lecture 4 Hypothesis Testing

Small Area Estimation for Business Surveys

Lecture 2: Prelude to the big shrink

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

β0 + β1xi. You are interested in estimating the unknown parameters β

Queueing Networks II Network Performance

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Computing MLE Bias Empirically

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Introduction to Regression

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

Chapter 14 Simple Linear Regression

UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist?

Nonparametric model calibration estimation in survey sampling

β0 + β1xi and want to estimate the unknown

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

Transcription:

Statstcs Publcatons Statstcs 6-016 A note on regresson estmaton wth unknown populaton sze Mchael A. Hdroglou Statstcs Canada Jae Kwang Km Iowa State Unversty jkm@astate.edu Chrstan Olver Nambeu Statstcs Canada Follow ths and addtonal works at: http://lb.dr.astate.edu/stat_las_pubs Part of the Desgn of Experments and Sample Surveys Commons Multvarate Analyss Commons and the Statstcal Methodology Commons The complete bblographc nformaton for ths tem can be found at http://lb.dr.astate.edu/ stat_las_pubs/1. For nformaton on how to cte ths tem please vst http://lb.dr.astate.edu/ howtocte.html. Ths Artcle s brought to you for free and open access by the Statstcs at Iowa State Unversty Dgtal Repostory. It has been accepted for ncluson n Statstcs Publcatons by an authorzed admnstrator of Iowa State Unversty Dgtal Repostory. For more nformaton please contact dgrep@astate.edu.

Survey Methodology June 016 11 Vol. 4 No. 1 pp. 11-135 Statstcs Canada Catalogue No. 1-001-X A note on regresson estmaton wth unknown populaton sze Mchael A. Hdroglou Jae Kwang Km and Chrstan Olver Nambeu 1 Abstract The regresson estmator s extensvely used n practce because t can mprove the relablty of the estmated parameters of nterest such as means or totals. It uses control totals of varables known at the populaton level that are ncluded n the regresson set up. In ths paper we nvestgate the propertes of the regresson estmator that uses control totals estmated from the sample as well as those known at the populaton level. Ths estmator s compared to the regresson estmators that strctly use the known totals both theoretcally and va a smulaton study. Key Words: Optmal estmator; Survey samplng; Weghtng. 1 Introducton Regresson estmaton has been ncreasngly used n large survey organzatons as a means to mprove the relablty of the estmators of parameters of nterest (such as totals or means) when auxlary varables are avalable n the populaton. A comprehensve overvew of the regresson estmator n survey samplng can be found n Cassel Särndal and Wretman (1976) and Fuller (009) among others. We next llustrate how the regresson estmator can be used to estmate the total = y U U = 1 N denotes the target populaton. A sample s of expected sze n s selected accordng to a samplng plan p s from U s the resultng probablty of ncluson of the frst order. In the absence of auxlary varables we use the Horvtz-Thompson estmator gven by = d s y (Horvtz and Thompson 195) d =1 s referred to as the weght survey assocated wth unt. The regresson estmator s gven by REG = X X B (1.1) X x X = x = d U s x = 1 x xp and B s a p dmensonal vector of estmated regresson coeffcents whch s computed as a functon of the observed varables y x n the sample s. Note that the components of the vector of populaton total X are known for each of the correspondng components varables n the vector x = 1 x xp used to compute B. However there are nstances when we have more observed auxlary varables n the sample than n the populaton. Assume that the sample has q observed varables q > p and that the p varables n the populaton are a subset of the q varables observed n the sample. Furthermore suppose that some of the extra q p varables n the sample are well correlated wth the varable of nterest y. Can these extra varables be ncorporated n the 1. Mchael A. Hdroglou Busness Survey Methods Dvson Statstcs Canada ON Canada K1A 0T6. E-mal: hdrog@yahoo.ca; Jae Kwang Km Department of Statstcs Iowa State Unversty Ames IA 50011. E-mal: jkm@astate.edu; Chrstan Olver Nambeu Busness Survey Methods Dvson Statstcs Canada ON Canada K1A 0T6. E-mal: chrstanolver.nambeu@canada.ca.

1 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze regresson estmator so as to make t more effcent? Sngh and Raghunath (011) attempted to respond to that queston for the case q = p 1. Ther extra varable n the sample was the ntercept. They used t to estmate the unknown populaton sze N by N = d. In ths artcle we compare the estmator proposed by Sngh and Raghunath (011) to other regresson estmators when N s known or unknown. In Secton we descrbe standard regresson estmators for estmatng totals when N s known as well as the regresson proposed by Sngh and Raghunath (011) when N s unknown. In Secton 3 an alternatve estmator s proposed for the case N s unknown. A smulaton study s carred out n Secton 4 to llustrate the performance of the varous estmators studed n terms of bas and mean square error. Overall conclusons and recommendatons are gven n Secton 5. s Regresson estmators Under general regularty condtons (Isak and Fuller 198; Montanar 1987) an approxmaton to the regresson estmator (1.1) s REG = X X B (.1) B s the lmt n probablty of B when both the sample and the populaton szes tend to nfnty. For large samples the varance of regresson estmator (1.1) can be studed va (.1). Note that REG s unbased under the samplng plan p s and can be re-expressed as: = XB REG de (.) E = y xb. The desgn varance for REG can be approxmated by s E E j AV p REG = j (.3) U ju j j = j j and j s the second order ncluson probablty for unts and j. Both the modelasssted (Särndal Swensson and Wretman 199) and the optmal-varance (Montanar 1987) approaches can be used to estmate B. They both yeld approxmately unbased estmators. In the case of the modelasssted approach the basc propertes (bas and varance terms) are vald even when the model s not correctly specfed. Under the optmal-varance approach no assumpton s made on the varable of nterest. The model-asssted estmator of Särndal et al. (199) assumes a workng model between the varable of nterest y and the auxlary varables x. The workng model s denoted by m : y = x β β s a vector of p unknown parameters Em x =0 Vm x = and Cov m j x x j = 0 j. Under ths approach B n equaton (.1) s the ordnary least squares estmator of β n the populaton and t s gven by 1 B = x x x GREG c c y U U (.4) Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 13 c =. Ths yelds the followng estmator for the total GREG = X X B (.5) GREG 1 = B. GREG cd xx cd xy s s (.6) The optmal estmator of Montanar (1987) obtaned by mnmzng the desgn varance of REG = X X B s = X X B (.7) V 1 B = X Cov X 1 x x j x y j = j j. U ju j U ju j (.8) The optmal estmator for the total s estmated by = X X B (.9) 1 x j x j j x y j B =. s jsj j s jsj j (.10) Note that the computaton of the regresson vectors requres that the frst component that defnes them s nvertble. We can ensure ths by reducng the number of auxlary varables that are nput nto the regresson f not much loss n effcency of the resultng regresson estmator s ncurred. If on the other hand there s a sgnfcant loss n effcency then we can nvert these sngular matrces usng generalsed nverses. As mentoned n the ntroducton not all populaton totals may be known for each component of the auxlary vector x. The regresson normally uses the auxlary varables for whch a correspondng populaton total s known. Decomposng x as 1 x x = x xp Sngh and Raghunath (011) proposed a GREG-lke estmator that assumes that the regresson s based on an ntercept and the varable x even though only the populaton total of the x s known. For the case that N s not known and that the populaton total of x s known ther estmator s Statstcs Canada Catalogue No. 1-001-X

14 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze X = U obtaned from GREG 1GREG GREG X X BGREG = (.11) x and X = d. s x The regresson vector of estmated coeffcents B GREG s = B B B gven by (.6). The approxmate desgn varance for takes the same form as equaton (.3) wth and N U X = x N. E = y GREG x B B c c y x X x X x X = GREG N N N U U The propertes of (.11) can be obtaned by notng that = = GREG GREG p 1 Snce O n 1 X X BGREG X X BGREG X X BGREG BGREG =. B B under some regularty condtons dscussed n Fuller (009 Chapter ) the last term s of smaller order. Thus gnorng the smaller order terms we get the followng approxmaton d E E (.1) s U E = y x B GREG. Thus s approxmately desgn-unbased. The asymptotc varance can be computed usng V de E = E de E. s U s U As we can see the asymptotc varance can be qute large unless = 0. Remark.1 If y = a bx E U we have = N N a and ths mples that V av N. Ths means that f V N >0 we can artfcally ncreases by choosng large values of a. = p Note that the optmal regresson estmator usng x x unbased because = = av N the varance of x s also approxmately desgn X X B X X B X X B B = 1 B s obtaned by replacng x by B = B O n under p some regularty condtons dscussed n Fuller (009 Chapter ) gnorng the smaller order terms we get x n equaton (.8). Snce Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 15. X X B The asymptotc varance of s smaller than the one assocated wth. The reason for ths s that the optmal estmator mnmzes the asymptotc varance among the class of estmators of the form ndexed by B. = X X B B (.13) 3 Alternatve regresson estmator We now consder an alternatve estmator that does not use the populaton sze N nformaton. Rather t uses the known ncluson probabltes provded that they are known for each unt n the populaton. U we can use = Gven that = n nd e z x as auxlary data n the model y = z e 0. Ths means that the ncorporaton of the varance structure c of the error n the regresson vector s gven by c = d. The resultng estmator s gven by U d s wth Z = z Z = z and = Z Z B (3.1) 1 B = cd zz cd zy. s s (3.) Ths estmator corresponds exactly to the one gven by Isak and Fuller (198). Remark 3.1 By constructon Snce s a component of d y zb z 0 s =. d y zb =0 ths leads to s z we have = ZB. Thus s the best lnear unbased predctor of = N y =1 under the model e 0. y = x β e 1 Statstcs Canada Catalogue No. 1-001-X

16 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze Note that B can be expressed as B GREG by settng c = d and x = z. Thus the proposed regresson estmator can be vewed as a specal case of GREG estmator. Usng the argument smlar to (.1) we obtan (3.3) s U d E E E = y zb and 1 B =. c c y z z U z U The proposed estmator s approxmately unbased and ts asymptotc varance j zb = j s U ju j V d y s often smaller than the asymptotc varance of Sngh and Raghunath (011) s estmator. The optmal verson of uses = z x as auxlary data. It s gven by K K E = Z Z B (3.4) E B K s obtaned by substtutng x by z n equaton (.10). In ths case the optmal B Z Z cannot be computed because the varance- Remark 3. For fxed-sze samplng desgns we have Vp d =0. s regresson coeffcent vector 1 K = V Cov p p covarance matrx Vp Z s not nvertble. Thus the optmal estmator wth = the optmal estmator (.9) only usng Remark 3.3 For random-sze samplng desgns Vp d 0. s = x. z x reduces to In ths case all of the components of z x can be used n the desgn-optmal regresson estmator (.9). A dffculty wth usng the optmal estmator K s that t requres the computaton of the jont ncluson probabltes j: these may be dffcult to compute for certan samplng desgns. An estmator that does not requre the computaton of the jont ncluson probabltes s obtaned by assumng that j = j. We refer to ths estmator as the pseudo-optmal estmator P. It s gven by P = Z Z B (3.5) P 1 B P = cd cd y z z s z s Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 17 and c = d 1. In general the pseudo-optmal estmator P should yeld estmates that are qute close to those produced by when the samplng fracton s small. Note that P s exactly equal to the optmal estmator K n the case of Posson samplng. In ths samplng desgn the ncluson probabltes of unts n the sample are ndependent. The approxmate desgn varance for K and P have the same form as the one gven n equaton (.3) wth the E s respectvely gven by y zb y zb K and zb. y P 4 Smulatons We carred out two smulaton studes. The frst one used a dataset provded n the textbook of Rosner (006) and the second one was based on an artfcal populaton created accordng to a smple lnear regresson model. The frst smulaton assessed the performance of all of the estmators wth respect to dfferent sample schemes whle the second smulaton study focused on the mpact of changng the ntercept value n the model. The parameter of nterest for these two smulatons s the total of the varable of nterest y : = y. All estmators were used U GREG P and K wth the avalable auxlary data. Table 4.1 summarzes the auxlary data and the varance structure of the errors (when applcable) assocated wth the estmators used n the two studes. Table 4.1 Estmators used n smulaton N known N unknown GREG as defned by (.5) wth x = 1 x and c = c 1 as defned as specal case of (.11) wth x = x as defned by (.9) wth = 1 x as defned by (.9) wth = 1 x x 1 x as defned by (.9) wth x = x as defned by (3.1) wth z = x and c = d P as defned by (3.5) wth z = 1 x and c = d 1 K as defned as (3.4) wth z = x P as defned as (3.5) wth z = x and c = d 1 The performance of all estmators was evaluated based on the relatve bas the Monte Carlo relatve effcency and the approxmate relatve effcency. Expressons of these quanttes as shown below. 1. Relatve bas: Statstcs Canada Catalogue No. 1-001-X

18 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze ESTr 100 RB EST = R R ESTr (4.1) =1 represents one of the estmators presented n Table 4.1 as computed n the r th Monte Carlo sample.. Monte Carlo Relatve effcency MC GREG MSEMC EST RE EST = MSE (4.) R 1 MSE =. r MC EST EST R r =1 The RE measures the relatve effcency of the estmator EST wth respect to GREG. 3. Approxmate Relatve effcency GREG AV p EST AR EST = AV p (4.3) E E AV j p EST = j U ju s the approxmate varance of EST wth E = y xb EST. The approxmate relatve effcency AR measures the relatve gan n effcency of EST wth respect to GREG usng the populaton resdual obtaned by Taylor lnearsaton. It s expected that RE and AR gve comparable results. However as we wll see ths may not be the case. 4.1 Smulaton 1 The populaton was the dataset (FEV.DAT) avalable on the CD that accompanes the textbook by Rosner (006). The data fle contans 654 records from a study on Chldhood Respratory Dsease carred out n Boston. The varables n the fle were: age heght sex (male female) smokng (ndcates whether the ndvdual smokes or not) and Forced expratory volume (FEV). Sngh and Raghunath (011) used the same data set. The parameter of nterest s the total heght y of the populaton. The varable age x 1 was used as auxlary varable n the regresson. The varable FEV x was chosen as the sze varable to compute probabltes of selecton for the samplng schemes that are consdered n ths smulaton. The two varables sex and smokng were dscarded from the smulaton. Table 4. summarzes the central tendency measures of the three varables n the populaton. For each varable the mean and medan were smlar. Ths ndcates that the three varables have a symmetrcal dstrbuton. Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 19 Table 4. Descrptve statstcs of y x 1 and x Mn Q1 Medan Mean Q3 Max y 46 57 61.5 61.14 65.5 74 x 1 3 8 10 9.931 1 19 x 0.79 1.98.55.64 3.1 5.79 Fgure 4.1 dsplays the relatonshp between the varable of nterest y and the auxlary varable x. The 1 relatonshp between Heght y and the age x appears to be lnear but does not go through the orgn. The Pearson correlaton coeffcent between y and x 1 was 0.79. 1 Heght 45 50 55 60 65 70 75 5 10 15 Age Fgure 4.1 Relatonshp between the varable of nterest Heght and the auxlary varable Age. The objectve of ths smulaton study was to evaluate the performance of the estmators presented n Table 4.1 usng dfferent samplng desgns. We consdered the Mdzuno the Sampford and the Posson samplng desgns. The varable x were used as a sze measure for the three samplng schemes to compute the ncluson probabltes. These samplng desgns are as follows: 1. Mdzuno samplng (see Mdzuno 195): The frst unt s sampled wth probablty p and the remanng n 1 unts are selected as a smple random samplng wthout replacement from the remanng N 1 remanng unts n the populaton. The probabltes of selecton p for unt Statstcs Canada Catalogue No. 1-001-X

130 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze s gven by p = x x. The frst order ncluson probablty for unt s gven by U 1 = N 1 N n p n 1.. Sampford samplng (see Sampford 1967): The algorthm for selectng the sample s carred out as follows. The frst unt s selected wth probablty p = x x U and the remanng n 1 unts are selected wth replacement wth probablty = 1 np 1 p. If any of the unts are selected more than once the procedure s repeated untl all elements of the sample are dfferent. The probablty of ncluson of the frst order s gven by = np. 3. Posson samplng: Each unt s selected ndependently resultng n a random sample sze. The probablty of selectng unt s p = x x. The ncluson probablty assocated wth U unt s = np. A good descrpton of ths procedure can be found n Särndal et al. (199). The total of = y U was the parameter of nterest. Based on each of these samplng schemes we selected R = 000 Monte Carlo samples of sze n = 50. Estmators n Table 4.1 were then computed for each sample. The performance of the estmators was then assessed usng the Relatve Bas the Monte Carlo Relatve Effcency and the Approxmate Relatve Effcency as descrbed by the equatons (4.1) (4.) and (4.3) respectvely. 4. Smulaton 1 results Smulaton results are presented n Table 4.3. All estmators studed are approxmately unbased and ther relatve bas s smaller than 1%. We dscuss separately the approxmate relatve effcency (AR) and the relatve effcency (RE) of the estmators when the populaton sze N s known and unknown. Case 1: Populaton sze N s known We compare the AR and the RE for the followng estmators n Table 4.3: GREG and P for each of the three samplng desgns. We can do so for almost all these estmators except for for the Mdzuno and the Sampford samplng schemes. In ths case we cannot compute B for a smlar reason as the one descrbed n Remark 3.. On the bass of both AR and RE the pseudo-optmal estmator s the most relable estmator regardless of the samplng scheme. It s close to the optmal estmator only n terms of AR. Both the RE and the AR of the optmal estmator were not as close as expected under the Mdzuno samplng desgn. The poor behavour of the RE of the optmal estmator has also been observed by Montanar (1998). Fgure 4. explans what s happenng. We observe that most estmates obtaned for the optmal estmator for the 000 Monte Carlo samples are close to the mean. However n some samples the estmates are qute far from t. Ths s n contrast to P the values are tghtly centered around the mean: note that the assocated RE and AR are qute close to one another. Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 131 P estmators 0000 30000 40000 50000 P estmators 0000 30000 40000 50000 0 500 1500 0 500 1500 Replcates Replcates Fgure 4. Scatter plots of Monte Carlo estmators under the Mdzuno Samplng Desgn. The optmal estmator s equvalent to the pseudo-optmal estmator P n the case of Posson samplng scheme. Recall that the optmal estmator used x = 1 x as auxlary data. The optmal estmator used x = 1 x as auxlary data. The addton of the has sgnfcantly mproved the effcency of the optmal estmator for the Posson samplng scheme. Sngh and Raghunath (011) used 1 when N was known but dd not nclude t as a control count. Nonetheless they observed that 1 was qute comparable to GREG n terms of AR and RB for the Mdzuno samplng desgn. The reason for ths s that ths samplng scheme s qute close to smple random samplng wthout replacement. However usng these two measures 1 s by far the worst estmator for the other two samplng schemes. Case : Populaton sze N s unknown Fve estmators are reported n Table 4.3 for ths case. However as s qute close to K and P we comment on the results obtaned for 1 1 and. Estmators 1 1 and were very smlar n terms of relatve effcency and approxmate relatve effcency for the Mdzuno samplng desgn. For the Sampford samplng scheme 1 and P were comparable and slghtly better than 1. Under the Posson samplng scheme 1 and outperformed 1. We can also see that 1 was very neffcent wth an RE at least 10 tmes larger than those assocated wth or P. Note that was better than 1 : ths s reasonable as uses two auxlary varables as 1 uses the sngle auxlary varable x. Statstcs Canada Catalogue No. 1-001-X

13 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze Table 4.3 Comparson of estmators n terms of relatve bas and relatve effcences Populaton sze known GREG Populaton sze unknown Ŷ Ŷ P 1 1 K P Mdzuno RB (n %) 0.08 0.04 0.07 0.07 0.07 0.07 0.07 RE 1.00 5.84 0.54 0.94 0.93 0.93 0.93 AR 1.00 0.55 0.55 0.94 0.93 0.93 0.93 Sampford RB (n %) 0.11 0.11 0.07-0.01 0.07 0.0 0.0 RE 1.00 0.59 0.58 14.7 13.69 13.55 13.56 AR 1.00 0.55 0.56 15.77 14.39 14.39 14.40 Posson RB (n %) 0.11 0.11 0.08 0.08 0.09 0.14 0.16 0.16 0.16 RE 1.00 0.96 0.57 0.57 160.47 15.49 13.85 13.85 13.85 AR 1.00 0.96 0.55 0.56 180.36 16.73 14.40 14.39 15.73 Note: We do not provde results for and K for the Mdzuno and Sampford desgns because the varance-covarance matrx s not nvertble. 4.3 Smulaton The performance of the estmators was assessed for dfferent values of the ntercept n the model. We restrcted ourselves to the Posson samplng desgn to llustrate Remark.1 n Secton : that s the effcency of deterorates as the ntercept gets bgger. The populaton was generated accordng to the followng model y = a x e. (4.4) The e values were generated from a normal dstrbuton wth mean 0 and varance =1. The x values were generated accordng to a ch-square dstrbuton wth one degree of freedom. Three populatons of sze N 5000 were generated usng (4.4) wth dfferent values of the ntercept a. Note that x values were re-generated for each populaton. The three populatons were labelled as A B and C dependng on the ntercept used. The ntercept values were set to 3 5 and 10 respectvely for populatons A B and C. From each of these populatons we drew R = 000 Monte Carlo samples wth expected sample sze n = 50 usng the Posson samplng desgn. The frst ncluson probablty was set equal to = nz z U for each unt. The z values were generated accordng to the followng model z =0.5 y u u was a random error generated accordng to an exponental dstrbuton wth mean k equals to 0.5 or 1. 4.4 Smulaton results Numercal results are gven n Table 4.4 for k = 1 and Table 4.5 for k = 0.5. All estmators are approxmately unbased wth relatve bases smaller than 1%. Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 133 Case 1: Populaton sze N s known As expected both optmal estmators and are more effcent than GREG. The optmal estmator based on 1 x s slghtly better than GREG. The ncluson of the addtonal varable resultng n yelds sgnfcant gans n terms of RE and AR : these gans decrease as the ntercept gets larger. Once more 1 s qute neffcent and as noted n Remark.1 ths neffcency ncreases as the ntercept gets larger. The prevous observatons are vald regardless of k. The effcency of both optmal estmators and decreases as k gets smaller. Case : Populaton sze N unknown The most effcent estmator s. It outperforms 1 as t uses more auxlary varables. Estmator 1 s by far the most neffcent one. As the ntercept n the populaton model ncreases the relatve effcency (both n terms of RE and AR s farly stable for. On the other hand the relatve effcences assocated wth 1 and 1 deterorate rapdly as the ntercept n the populaton model ncreases. The effect of k on the effcences of the estmators s as descrbed when the populaton sze s known. Table 4.4 Relatve bas and relatve effcences of the estmators for k =1under Posson samplng desgn Intercept Populaton sze known Populaton sze unknown GREG Ŷ P 1 1 K P 3 RB (n %) 0.3 0.38 0.56 0.56 0.18 0.77 0. 0. 0. RE 1.00 0.95 0.67 0.67 7.7 5.4 0.94 0.94 0.94 AR 1.00 0.94 0.60 0.98 7.08 5.01 0.85 0.85 0.91 5 RB (n %) 0.04 0.07 0.18 0.18-0.01 0.67-0.07-0.07-0.07 RE 1.00 0.99 0.76 0.76 3.91 16.63 1.50 1.50 1.50 AR 1.00 0.98 0.70 0.73 3.48 16.0 1.45 1.45 1.5 10 RB (n %) -0.01-0.0 0.06 0.06-0.57 0.79-0.0-0.0-0.0 RE 1.00 1.00 0.80 0.80 88.30 67.47.0.0.0 AR 1.00 0.99 0.73 0.74 97.9 66.13.15.15.0 Table 4.5 Relatve bas and relatve effcences of the estmators for k =0.5under Posson samplng desgn Intercept Populaton sze known Populaton sze unknown GREG P 1 1 K P 3 RB (n %) 0.13 0.5 0.4 0.4-0.18 0.54-0.0-0.0-0.0 RE 1.00 0.99 0.89 0.89 8.4 5.93 1.78 1.78 1.78 AR 1.00 0.96 0.83 0.95 8.30 5.83 1.79 1.79.10 5 RB (n %) 0.03 0.09 0. 0. 0.7 1.49 0.18 0.18 0.18 RE 1.00 1.00 0.91 0.91 4.35 17.39 3.6 3.6 3.6 AR 1.00 0.98 0.88 0.94 3.83 16.41 3.15 3.15 3.54 10 RB (n %) 0.06 0.07 0.1 0.1 0.33 1.4 0.13 0.13 0.13 RE 1.00 1.00 0.96 0.96 98.69 73.93 6.6 6.6 6.6 AR 1.00 0.99 0.91 0.9 98.65 66.0 5.89 5.89 6.4 Statstcs Canada Catalogue No. 1-001-X

134 Hdroglou Km and Nambeu: A note on regresson estmaton wth unknown populaton sze 5 Conclusons The regresson estmator can be qute effcent f the auxlary data that t uses are well correlated wth the varable of nterest. Furthermore t requres that populaton totals correspondng to the auxlary varables are avalable. In ths artcle we nvestgated the behavor of the regresson estmator proposed by Sngh and Raghunath (011). Ths estmator uses estmated populaton count as a control total and the known populaton totals for the auxlary varables. We compared t to the Generalzed Regresson estmator GREG ts optmal analogue and to an alternatve estmator that uses the frstorder ncluson probabltes and auxlary data for whch the populaton totals are known. As the optmal regresson estmator requres the computaton of second-order ncluson probabltes we also ncluded a pseudo-optmal estmator P that does not requre them. We nvestgated the propertes of these estmators n terms of bas and effcency va a smulaton that ncluded varous samplng desgns and dfferent values of the ntercept n the model for a generated artfcal populaton. We compared the results when the populaton sze was known and unknown. When the populaton sze s known the most effcent estmator s the optmal estmator. However snce ths estmator can be unstable the pseudo-optmal estmator P s a good alternatve to t. Ths s n lne wth Rao (1994) who favoured the optmal estmator P over the Generalzed Regresson estmator GREG. The Sngh and Raghunath (011) proposton to use s not vable as t can be qute neffcent. When the populaton sze s not known the alternatve regresson estmator s the best one to use. Acknowledgements The authors kndly acknowledge suggestons for mproved readablty provded by the Assocate Edtor and the referees. References Cassel C.M. Särndal C.-E. and Wretman J.H. (1976). Some results on generalzed dfference estmators and generalzed regresson estmaton for fnte populatons. Bometrka 63 615-60. Fuller W.A. (009). Samplng Statstcs. New ork: John Wley & Sons Inc. Horvtz D.G. and Thompson D.J. (195). A generalzaton of samplng wthout replacement from a fnte unverse. Journal of the Amercan Statstcal Assocaton 47 663-685. Isak C.T. and Fuller W.A. (198). Survey desgn under the regresson superpopulaton model. Journal of the Amercan Statstcal Assocaton 77 89-96. Mdzuno H. (195). On the samplng system wth probablty proportonal to sum of sze. Annals of the Insttute of Statstcal Mathematcs 3 99-107. Statstcs Canada Catalogue No. 1-001-X

Survey Methodology June 016 135 Montanar G.E. (1987). Post-samplng effcent QR-predcton n large-scale surveys. Internatonal Statstcal Revew 55 191-0. Montanar G.E. (1998). On regresson estmaton of fnte populaton means. Survey Methodology 4 1 69-77. Rao J.N.K. (1994). Estmatng totals and dstrbuton functons usng auxlary data nformaton at the estmaton stage. Journal of Offcal Statstcs 10() 153-165. Rosner B. (006). Fundamentals of Bostatstcs. Sxth edton Duxbury Press. Sampford M.R. (1967). On samplng wthout replacement wth unequal probabltes of secton. Bometrka 54 499-513. Särndal C.-E. Swensson B. and Wretman J. (199). Model Asssted Survey Samplng. New ork: Sprnger-Verlag. Sngh S. and Raghunath A. (011). On calbraton of desgn weghts. METRON Internatonal Journal of Statstcs vol. LXIX 185-05. Statstcs Canada Catalogue No. 1-001-X