BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING

Size: px
Start display at page:

Download "BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING"

Transcription

1 Statistica Sinica 22 (2012), doi: BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Desislava Nedyalova and Yves Tillé University of Neuchâtel Abstract: In model-based inference, the selection of balanced samples has been considered to give protection against misspecification of the model. A recent development in finite population sampling is that balanced samples can be randomly selected. There are several possible strategies that use balanced samples. We give a definition of balanced sample that embodies overbalanced, mean-balanced, and π-balanced samples, and we derive strategies in order to equalize a d-weighted estimator with the best linear unbiased estimator. We show the value of selecting a balanced sample with inclusion probabilities proportional to the standard deviations of the errors with the Horvitz-Thompson estimator. This is a strategy that is design-robust and efficient. We show its superiority compared to other strategies that use balanced samples in the model-based framewor. In particular, we show that this strategy is preferable to the use of overbalanced samples in the polynomial model. The problem of bias-robustness is also discussed, and we show how overspecifying the model can protect against misspecification. Key words and phrases: Balanced sampling, finite population sampling, polynomial model, ratio model, robust estimation. 1. Introduction The principal difference between the model-based and the classical designbased approach for estimating finite population totals lies in the source of randomness they use (Särndal (1978)). In design-based sampling, the inference is based on the stochastic structure induced by the sampling design. In the modelbased, or prediction approach, the inference depends on the validity of the model used to describe the data. In this case, the randomness is due to the population model and not to the sampling design. The model-based approach was developed, among others, by Royall (1976, 1992), Royall and Cumberland (1981), and Chambers (1996). When the data are assumed to follow a linear model, Royall (1976) proposed the use of the best linear unbiased predictor. The model-based approach has been criticized due to the fact that it may lead to severe bias if the model assumptions are violated. In contrast to model-based inference, design-based inference is design-robust by

2 778 DESISLAVA NEDYALKOVA AND YVES TILLÉ definition. Brewer and Särndal (1983) point out that, since the inference is not based on a model, there is no need to worry about model misspecification. Much of the wor in model-based research has been devoted to the construction of robust strategies. More specifically, in order to protect the inference against a misspecified model, Royall and Herson (1973a,b) and Scott, Brewer, and Ho (1978) point out the importance of balanced samples, where balance is achieved by equalizing the sample moments of the independent variables with those in the population. They came to the conclusion that the sample must be balanced, but not necessarily random. Another way to accomplish design-robustness in the model-based approach is to choose an appropriate sampling design. Since Deville and Tillé (2004) s paper, it is now possible to randomly select balanced samples using a procedure called the cube method. Nedyalova and Tillé (2008) have shown that under a random balanced sampling design, with inclusion probabilities proportional to the standard deviations of the errors of the model, and under certain conditions defined as fully explainable heteroscedasticity, the best linear unbiased estimator is the Horvitz-Thompson estimator. This is an optimal strategy that reconciles the two approaches. Scott, Brewer, and Ho (1978) and Royall and Herson (1973a) recommend the use of balanced samples in order to protect against a misspecification of the model, while Nedyalova and Tillé (2008) recommend using balanced sampling for minimizing the anticipated variance under a linear model. An important particular case is the polynomial model. Scott, Brewer, and Ho (1978) suggest using overbalanced samples with an ad-hoc weighting system. We investigate the different strategies leading to design-robust estimations under the model-based framewor, and consider the polynomial model, overbalancing and robustness under misspecification in light of Nedyalova and Tillé (2008). The notation and definitions are given in Section 2. The model-based framewor is briefly introduced in Section 3. In Section 4, we consider a large class of balanced designs, called d-balanced designs, that embody balanced, π- balanced, mean-balanced, and overbalanced samples. We also consider a class of weighted estimators, called d-weighted estimators, and discuss several strategies for which the Best Linear Unbiased (BLU) estimator is the d-estimator. These results generalize some results of Nedyalova and Tillé (2008) for the Horvitz- Thompson estimator. In Section 5, we give a definition of design-robustness and show that an appropriate strategy to protect against misspecification of the model consists of overspecifying the model and then balancing on the independent variables of the model. In Section 6, we revisit the polynomial model to show that the overbalancing strategy of Scott, Brewer, and Ho (1978) is suboptimal, and to give an alternative strategy that minimizes the anticipated variance.

3 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 779 In Section 7, we draw some general conclusions on selecting a sample when one reasonably believes in a linear model but sees protection against misspecification. 2. Notation and Definitions Consider a population U of size N. Each unit of the population is identified by a label = 1,..., N. Suppose that a register is available, and that the values of p auxiliary variables are nown for each unit of the population. Let y be the value taen by the variable of interest y on the th unit of the population. The values y are unnown. We are interested in estimating the population total Y = y. The total Y is estimated by a sample s of size n, where s is a subset of U. A sample is only a subset of the population and is not necessarily randomly selected. A sampling design is a tool to randomly select a sample. It is defined by assigning to each sample s a probability p(s) of being selected. Let S denote the random sample such that Pr(S = s) = p(s). The inclusion probability π is then the probability that unit is selected in the sample. We denote by E p ( ) and var p ( ), respectively, the expectation and variance under the sampling design p( ), and by S the set of units of the population which are not in S. Definition 1. An estimator Ŷ is design-unbiased if E p(ŷ ) = Y. Definition 2. A sample s is d-balanced on a set of variables x = (x 1 x p ) if and only if x, d x = s where d 1,..., d,..., d N is a set of weights that do not depend on the sample s. The weights d 1,..., d,..., d N are positive values. An important example is given by the d = 1/π used in the Horvitz-Thompson estimator. When the sample is randomly selected, an inclusion probability can be assigned to each statistical unit. If π > 0, for all U, the Horvitz-Thompson estimator of Y, Ŷ π = y, π is design-unbiased, i.e. E p (Ŷπ) = Y. For a balanced sample, the inclusion probabilities are π = 1/d, and the procedure that randomly selects a balanced sample is called a balanced sampling design. According to Deville and Tillé (2004), a sampling design p( ) is balanced on the auxiliary variables x 1,..., x p if and only if x = x. (2.1) π

4 780 DESISLAVA NEDYALKOVA AND YVES TILLÉ Authors such as Cumberland and Royall (1981) and Kott (1986) would call this a π-balanced sampling, as opposed to a mean-balanced sampling defined by the equation 1 x = 1 x. n N We use the expression balanced sampling to denote a sampling design that satisfies (2.1) for one or more auxiliary variables, a mean-balanced sampling being a particular case of this balanced sampling when the sample is selected with inclusion probabilities n/n. If the population size is small, a balanced sampling design can be implemented by a linear program. For larger population sizes, the cube method may be used (see Deville and Tillé (2004) or Tillé (2006)). Whatever the algorithm used for selecting a balanced sample, an exact balanced sample cannot generally be found because it does not exist. It is however, always possible to select a sample that is almost balanced. We assume that the balancing error can be neglected. The selection of a balanced sample requires the use of a register that contains the values of the auxiliary variables for each unit of the population. 3. Model-based Strategy and BLU Estimator We assume that the population follows a linear model M, y = x β + ε, (3.1) where β is the vector of regression coefficients and ε is a vector of random variables ε such that E M (ε ) = 0, var M (ε ) = σ 2 ν 2, cov M(ε, ε l ) = 0 if l, Suppose ν, U, are nown. For simplicity, we scale them so that ν = N. Model (3.1) includes the possibility of heteroscedasticity. The heteroscedasticity of ν or ν 2 may or may not be proportional to some auxiliary variables included in x. Nedyalova and Tillé (2008) have shown the importance of using auxiliary variables whose linear combination is equal to ν or ν 2. Indeed, in this case, the optimal strategies for the model-based and design-based framewors are the same. An important and common hypothesis is that the random sample S and the errors ε of the model are independent. The symbols E M ( ), var M ( ), cov M ( ) denote, respectively, the expected value, the variance, and the covariance under M. Definition 3. An estimator Ŷ is model-unbiased if E M(Ŷ Y ) = 0.

5 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 781 Definition 4. The model mean squared error of an estimator Ŷ is MSE M(Ŷ ) = E M (Ŷ Y ) 2. When Ŷ is model-unbiased, the model mean squared error is also called the error variance, for instance in Royall and Cumberland (1981). Definition 5 (Isai and Fuller (1982)). The anticipated mean squared error of an estimator Ŷ is MSE pm(ŷ ) = E pe M (Ŷ Y )2. When Ŷ is design-unbiased, the anticipated mean squared error is also called the anticipated variance. Royall (1976) showed that, in the framewor of model-based inference, the Best Linear Unbiased (BLU) estimator is Ŷ BLU = = y + S x β BLU = x β BLU + (y x β BLU ) x β BLU + e, (3.2) e = y x β BLU, (3.3) where β BLU = A 1 x y ν 2, A = x x ν 2. The error variance of the best linear unbiased estimator is E M (ŶBLU Y ) 2 = σ 2 x A 1 x l + ν 2. S l S S We refer to the usual definition given, for instance, in Háje (1959, 1981), Ramarishnan (1975), and Joshi (1979). Definition 6. A strategy is a pair (p( ), Ŷ ) consisting of a sampling design and an estimator. Strategy 1. Use the best linear unbiased estimator and choose a sample of size n that minimizes E M (ŶBLU Y ) 2. This is an optimal, purely unbiased strategy that is not design-robust because, in some cases, it can lead to the choice of a very extreme sample, as in the

6 782 DESISLAVA NEDYALKOVA AND YVES TILLÉ following example. Suppose the model has no intercept and only one regressor (see for instance, Royall and Herson (1973a)): y = x β + ε, (3.4) with var M (ε ) = σ 2 ν 2 and ν2 x. Then the BLU estimator is Ŷ BLU = x y, x which, in this case, is the ordinary ratio estimator ŶR. As ν 2 x and ν = N, we have that ν 2 = N 2 x ) 2. x ( The error variance of ŶR is given by the expression ( E M (ŶR Y ) 2 = σ 2 N 2 ) 2 S x ) x. x ( x Thus, in this case, the optimal purely model-based strategy consists of choosing the units with the n largest values of variable x (Royall (1970)), a very extreme sample. The strategy can be dangerous if the model is wrong. It is thus reasonable to opt for a strategy that guarantees correct estimation when the model is misspecified, and that leads to design-unbiased inference. For these reasons, Strategy 1 is rarely used (see Hansen, Madow, and Tepping (1983)). 4. Balanced Sample under a Linear Model In this section we generalize the concept of balanced samples. Our results are thus more general than those in Nedyalova and Tillé (2008). Consider the d-weighted estimator Ŷ d = d y = d x β BLU + d e, (4.1) where e is defined in (3.3). Under d-balanced sampling, Ŷd is model-unbiased and its error variance is E M (Ŷd Y ) 2 = σ 2 (d 1) 2 ν 2 + ν 2. (4.2) S By comparing (4.1) with (3.2), we generalize Result 7 of Nedyalova and Tillé (2008).

7 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 783 Result 1. A sufficient condition that ŶBLU = Ŷd is that the sampling design is d-balanced on x, e (d 1) = 0. A particular case of Result 1 is given below. Corollary 1. A sufficient condition that ŶBLU = Ŷd is that the sampling design is d-balanced on x, there exists a vector λ such that λ x = ν 2 (d 1), for all U. Proof. If then λ x ν 2 (d 1) = 1, λ x e (d 1) = ν 2 (d 1) e (d 1) = Consider a strategy that meets the conditions of Result 1. Strategy 2. λ x ν 2 e = 0. Use a d-balanced sampling design on x, with d chosen so that d = (ν 2 + λ x )/ν 2, for all U, use the d-weighted estimator. A particular case of Strategy 2 was recommended by Scott, Brewer, and Ho (1978) in the case of a polynomial model. Since the conditions of Result 1 are met, we have that ŶBLU = Ŷd. The strategy judiciously chooses the d s to equalize the d-weighted estimator and the BLU estimator. For a sample size n, this strategy can be implemented by using the inclusion probabilities ν 2 π = ν 2 +. λ x The value of λ can be chosen freely subject to π = After some algebra, it is possible to show that E M (Ŷd Y ) 2 = σ 2 S ν 2 ν 2 + λ x = n. d ν 2 = σ2 ( S ν 2 + S λ x ).

8 784 DESISLAVA NEDYALKOVA AND YVES TILLÉ Moreover, E p E M (Ŷd Y ) 2 = σ 2 λ x. Nevertheless, we see below that this is not necessarily the best strategy. Eventually, consider a strategy proposed by Nedyalova and Tillé (2008) that also meets the conditions of Result 1. Strategy 3. Use a d-balanced sampling design on x, with d ν 1 where x is complemented so that there exist two vectors α and γ such that α x = ν 2 and γ x = ν, for all U, a fully explainable heteroscedasticity, use the d-weighted estimator. Strategy 3 was recommended by Nedyalova and Tillé (2008) because it minimizes the anticipated variance among the strategies that are design-unbiased (see also Fuller (2009, p.187)). It is thus better than Strategy in the class of design-unbiased strategies. With Strategy 3, we have ŶBLU = Ŷd. The condition of fully explainable heteroscedasticity can be obtained by adding ν 2 and d ν 2 to the list of balancing variables. Thus, ν 2, d ν 2, d ν 2 = d 2 ν2 = and the error variance of the d-weighted estimator given in (4.2) simplifies to E M (Ŷd Y ) 2 = σ 2 (d 1)ν 2. For a sample size n, we must tae d = N/(ν n), and π = 1/d. The error variance of the d-weighted estimator, given in (4.2), simplifies to E M (Ŷd Y ) 2 = σ ( ) 2 Nν (N n ν2 = σ 2 2 n ) ν Bias-robustness in Submodels A large part of the model-based inference is dedicated to the bias-robustness of the BLU estimator in the case of misspecification of the model. In this section, we propose a formalization of the question of bias-robustness and an analysis of the consequences of model misspecification. We assume that a model M is used to conceive the strategy, but that the true underlying model is M.

9 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 785 Definition 7. A strategy is bias-robust for a model M if E M (Ŷ Y ) = 0. Consider a model and an alternative model M : y = x β + ε M : y = z γ + η, where E M (η ) = 0. There is no assumption on the covariance matrix of the vector of η. Definition 8. Model M is a submodel of M if there exists a matrix A such that Ax = z, not necessarily of full ran. The basic fact needed to show that a strategy is robust is the following. Result 2. The strategy that consists of using a d-balanced sampling design on x (with any vector of d ) and the d-weighted estimator is bias-robust for any submodel of M. Proof. Under model M, and using a d-balanced sample, Ŷ d = d y = d (z γ + η ) = d (A x γ + η ) = A x γ + d η. Thus, E M (Ŷd Y ) = E M ( d η η ) = 0. Result 2 encourages the statistician to overspecify the model, i.e., to introduce additional variables into model M in order to ensure that M is really a submodel of M. Indeed, if model M* is true, the variance under the model of the d-weighted estimator is the same irrespective of whether the sample is balanced on x or on z. An over-specification of the model does not increase the variance under the model of the estimator because the independent variables of the model are only used for balancing the sample. Moreover, the d-weighted estimator does not depend on an estimated coefficient that relies on the number of auxiliary variables. Suppose now that the model is misspecified, i.e., that model M is not a submodel of M. If the sampling design is d-balanced on the independent variables of model M, then ( ) E M (Ŷd Y ) = γ. d z z

10 786 DESISLAVA NEDYALKOVA AND YVES TILLÉ Let f = z γ x ϕ be the residuals of a linear regression of (z γ) on x and ϕ the regression coefficient vector. Then, if the sampling design is d-balanced on x, [ E M (Ŷd Y ) = d (x ϕ + f ) ] (x ϕ + f ) = d f f, (5.1) for any value ϕ, in particular when ϕ = ( x x var(η ) ) 1 x z γ var(η ). The model variance thus only depends on the residuals of the regression, i.e., on the part of the model that remains misspecified. If we do not possess information about the z, we select a random sample with inclusion probabilities π = 1/d, because in selecting the sample randomly, the expected value under the sampling design of (5.1) is zero. Moreover, we can expect that, under reasonable asymptotic assumptions, the quantity E M (Ŷd Y ) N = 1 N ( f ) f, π remains bounded in probability with respect to the sampling design when multiplied by n. The random selection of a sample and the use of a design-unbiased estimator give an ultimate bias protection in the case where it is not possible to overspecify the model. This guarantees a negligible bias under M when n is large. 6. Application to the Polynomial Model 6.1. Presentation of the model The polynomial model was studied, among others, by Royall and Herson (1973a), Scott, Brewer, and Ho (1978), and Valliant, Dorfman, and Royall (2000). The model is J y = δ j β j x j + ε, (6.1) j=0 where x is the only independent variable, β j is the jth regression coefficient, δ j is equal to 1 or 0 as the term β j x appears or not in the regression, E M(ε ) = 0, var M (ε ) = σ 2 ν 2, and cov M(ε, ε l ) = 0, when l. We assume ν = N.

11 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 787 Now, from Result 2, for any set of vectors of d, the d-weighted estimator is bias-robust under any submodel of (6.1) provided d x j = This implies several results on the polynomial model A first suboptimal strategy Let S (J) be a particular sample for which S x j = x S x j, for j = 0,..., J. (6.2) x j+1 /ν 2 x 2 /ν2, for j = 0,..., J. (6.3) With a sample satisfying (6.3), Scott, Brewer, and Ho (1978) showed that the estimator Ŷ 0 = y + x y x /ν 2, S x2 /ν2 is BLU for any polynomial Model (6.1), and any value of ν. This simple condition on the sample implies that the estimator is bias-robust for a large class of polynomial models. It can easily be shown that a sufficient condition for a sample to satisfy (6.3) is ( x j 1 + λx ) ν 2 = x j, for j = 0,..., J, (6.4) where λ is a scalar that does not depend on j. Equality (6.4) can be satisfied by using Strategy 2. In this particular case, we have π = 1/d and d = 1 + λx /ν 2. Strategy 2. (for polynomial model) Select a balanced sample such that with the unequal inclusion probabilities x j = x j π, j = 0,..., J, π = λx /ν 2, U.

12 788 DESISLAVA NEDYALKOVA AND YVES TILLÉ The constant λ is chosen as a function of the desired sample size, where π = λx /ν 2 = n. (6.5) The solution of (6.5) in λ is unique. Strategy 2 is recommended by Scott, Brewer, and Ho (1978). The inclusion probabilities are, however, chosen so that the BLU estimator is equal to the Horvitz-Thompson estimator and is thus not optimized. Two particular cases of (6.3) are as follows. (a) When ν 2 x, under the condition ν = N, In this case, (6.3) reduces to S xj S x ν 2 = = ( xj x N 2 x ) 2. x Thus, the sample should satisfy the condition 1 n x j = 1 N n x j = 1 N S, for j = 0,..., J. x j, for j = 0,..., J. Royall and Herson (1973a) call such samples balanced. In this case, Ŷ0 reduces to the ordinary ratio estimator Ŷ R = x y x. (b) When ν 2 x2, it is easily shown that ν2 = N 2 x 2 / ( x ) 2. In this case, (6.3) reduces to x j 1 n = S xj S x, for j = 0,..., J. The sample S (J) is called overbalanced (Scott, Brewer, and Ho (1978)), and Ŷ 0 reduces to Ŷ OB = ( ) 1 y y + x. n x S

13 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE An alternative strategy for the polynomial model Strategy 3. (for polynomial model) Use inclusion probabilities that are proportional to ν, subject to π = n, 0 π 1. Select a balanced sample according to the balancing equations Use the Horvitz-Thompson estimator. x j = x j π, j = 0,..., J, (6.6) ν 2 = ν 2 π, ν = ν. (6.7) π Note that, since π nν /N, (6.7) becomes N n 1 = N and only means that the sample size must be fixed. Nedyalova and Tillé (2008) proved that this strategy minimizes the anticipated variances in the class of design-unbiased strategies. Strategy 3 is thus better than Strategy 2 in the sense that its anticipated variance is always smaller. With Strategy 3, the inclusion probabilities are chosen to minimize the variance, while with Strategy 2 the inclusion probabilities are chosen to meet the technical condition given in Result 1. Two particular cases are as follows. (a) When ν 2 x with ν = N, then ν 2 = ( N 2 x ) 2. x As π ν with π = n, it follows that π = (n/n)ν. In this case, if the sample has a fixed sample size and is balanced on x, it is automatically balanced on ν and ν 2. (b) When ν 2 x2, with ν = N, then ν 2 = N 2 x 2 ( x ) 2. Here too, we have π = (n/n)ν. In this case, if the sample has a fixed sample size and is balanced on x 2, it is automatically balanced on ν and ν 2.

14 790 DESISLAVA NEDYALKOVA AND YVES TILLÉ 6.4. A particular case: the ratio model Consider again the model without intercept and only one regressor with ν 2 x, as at (3.4), a particular case of the polynomial model. We have seen that the BLU estimator under this model is the ordinary ratio estimator Ŷ R = x y x, with error variance ( E M (ŶR Y ) 2 = σ 2 N 2 ) 2 S x ) x. x ( x Here Strategy 2 is endorsed by Royall and Herson (1973a). Strategy 2. (for the ratio model) Select a mean-balanced sample of size n and use the ratio estimator. With a mean-balanced sample on x, satisfying the condition 1 n x = 1 N n x, S the ratio estimator reduces to the sample mean Ŷ R = x y x = 1 y. n Moreover, E M (ŶR Y ) 2 = σ 2 N 2 (N n) n ( x x ) 2 = E p E M (ŶR Y ) 2. Strategy 3. (for the ratio model) Select a balanced sample on x with inclusion probabilities π ν x and use the Horvitz-Thompson estimator. In order to show that Strategy 3 is better than Strategy 2, we compare the anticipated and model-variances, E p E M (Ŷ Y )2 and E M (Ŷ Y )2, of the ratio and Horvitz-Thompson estimators under (3.4). Nedyalova and Tillé (2008) have shown that, under Strategy 3, ( E M (Ŷπ Y ) 2 = E p E M (Ŷπ Y ) 2 = σ 2 N 2 n ) ν 2.

15 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 791 After replacing ν with N x / x, we obtain [ E p E M (Ŷπ Y ) 2 = σ 2 N 2 n N 2 x ] ( ) 2. x With D = E p E M (ŶR Y ) 2 E p E M (Ŷπ Y ) 2 = E M (ŶR Y ) 2 E M (Ŷπ Y ) 2, simplification yields D = N 2 n [ N x ( x ) 2 1 Thus, Strategy 3 is better than Strategy 2 under (3.4). 7. Discussion ] 0. Under a linear model, the use of a purely model-based strategy (Strategy 1) can be dangerous. Balanced samples offer good protection against model misspecification. A d-balanced sampling design with the d-weighted estimator is a bias-robust strategy that assures protection against misspecification of the model. There however exist several ways of selecting balanced samples. The d-weighted estimator can be equivalent to the BLU estimator if some technical conditions are met. These conditions can be met by either choosing the ad hoc inclusion probabilities (Strategy 2) or by adding ν and ν 2 to the list of balancing variables and choosing the optimal inclusion probability (Strategy 3). For the polynomial model, Royall and Herson (1973a) and Scott, Brewer, and Ho (1978) used ad hoc inclusion probabilities. They showed that, in this case, an unweighted ratio estimator is BLU for a large class of polynomial models. Nevertheless, this strategy consists of choosing the inclusion probabilities in such a way that a technical property is satisfied. Strategy 3 is more appropriate, because the technical condition is met by adding two balancing variables and the inclusion probabilities can be chosen to minimize the anticipated variance. Strategy 2 is thus not admissible in the sense that it is always possible to obtain a smaller anticipated variance by selecting the units with inclusion probabilities proportional to the standard deviations of the errors of the model and using the Horvitz-Thompson estimator. Strategy 3 has the advantage of providing a bias-robust estimator, even if the model is misspecified. Indeed, in both cases, the estimator does not depend on the auxiliary variables and is always design-unbiased. In case of misspecification of the model or of measurement errors the estimators of totals thus remain unbiased but a part of the efficiency can

16 792 DESISLAVA NEDYALKOVA AND YVES TILLÉ be lost. The importance of this loss depends on the way in which the correlation between the independent variables of the model and the variables of interest is hindered by the measurement errors. Eventually, the best protection against a misspecification of the model consists of extending the list of balancing variables. Indeed, the addition of balancing variables that could be correlated to the variables of interest does not increase the model variance, but protects against bias under the model. If the model cannot really be specified, the ultimate protection against misspecification is always the random selection of the sample. We hope that these results give a general guideline on the way of planning a sample survey under a realistic linear model. In practice, designing a survey is often a more complex exercise for the following reasons: It is difficult to specify a model without nowing the variable(s) of interest. The sampling design is necessarily conceived before nowing the dependent variable. Thus the validation of the model is difficult. A survey is generally done for multiple purposes. Arbitration must thus be done between the variables of interest, the areas of interest, and the parameters to estimate, and thus between the models. Nonrepsonse implies that the set of respondents is never exactly balanced. Our results however give a general and ideal framewor on how to plan and estimate a total under a linear model. The result of Neyman (1934) concerning optimal stratification suffers the same problems. Optimality depends on the variable of interest, on the parameters to estimate, on the areas of interest. The variances within the strata are never really nown, thus they must be estimated using another survey, or must be derived from a hypothesis of the existence of a size effect. Indeed, optimal stratification is a particular case of our result. The generalization of our results to cluster sampling, two-stage, and two-phase sampling remains a challenging topic of research. Acnowledgement The authors than two anonymous reviewers and the Editor for their constructive comments that helped us improve the quality of this paper. This research is supported by the Swiss National Science Foundation (grant no. FN /1). References Brewer, K. and Särndal, C.-E. (1983). Six approaches to enumerative survey sampling. In Incomplete Data in Sample Surveys (Edited by W. G. Madow, I. Olin, and D. B. Rubin), Academic Press, N.Y.

17 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 793 Chambers, R. L. (1996). Robust case-weighting for multipurpose establishment surveys. J. Off. Statist. 12, Cumberland, W. and Royall, R. (1981). Prediction models in unequal probability sampling. J. Roy. Statist. Soc. Ser. B 43, Deville, J.-C. and Tillé, Y. (2004). Efficient balanced sampling: The cube method. Biometria 91, Fuller, W. A. (2009). Sampling Statistics. John Wiley, New Jersey. Háje, J. (1959). Optimum strategy and other problems in probability sampling. Cosopis. Pest. Mat. 84, Háje, J. (1981). Sampling from a Finite Population. Marcel Deer, New Yor. Hansen, M. H., Madow, W. G. and Tepping, B. J. (1983). An Evaluation of model-dependent and probability-sampling inferences in sample surveys. J. Amer. Statist. Assoc. 78, Isai, C. and Fuller, W. A. (1982). Survey design under a regression population model. J. Amer. Statist. Assoc. 77, Joshi, V. M. (1979). The best strategy for estimating the mean of a finite population. Ann. Statist. 7, Kott, P. S. (1986). When a mean-of-ratios is the best linear unbiased estimator under a model. Amer. Statist. 40, Neyman, J. (1934). On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection, J. Roy. Statist. Soc. 97, Nedyalova, D. and Tillé, Y. (2008). Optimal sampling and estimation strategies under the linear model. Biometria 95, Ramarishnan, M. (1975). Choice of an optimum sampling strategy. Ann. Statist. 3, Royall, R. (1970). On finite population sampling theory under certain linear regression models. Biometria 57, Royall, R. (1976). The linear least squares prediction approach to two-stage sampling. J. Amer. Statist. Assoc. 71, Royall, R. (1992). Robustness and optimal design under prediction models for finite populations. Surv. Methodol. 18, Royall, R. and Cumberland, W. (1981). The finite population linear regression estimator and estimators of its variance. An empirical study. J. Amer. Stat. Assoc. 76, Royall, R. and Herson, J. (1973a). Robust estimation in finite populations I. J. Amer. Statist. Assoc. 68, Royall, R. and Herson, J. (1973b). Robust estimation in finite populations II: Stratification on a size variable. J. Amer. Statist. Assoc. 68, Särndal, C.-E. (1978). Design-based and model-based inference in survey sampling. Scand. J. Statist. 5, Scott, A., Brewer, K. and Ho, E. (1978). Finite population sampling and robust estimation. J. Amer. Statist. Assoc. 73, Tillé, Y. (2006). Sampling Algorithms. Springer-Verlag, New Yor. Valliant, R., Dorfman, A. and Royall, R. (2000). Finite Population Sampling and Inference: A Prediction Approach. Wiley, New Yor.

18 794 DESISLAVA NEDYALKOVA AND YVES TILLÉ Institute of statistics, University of Neuchâtel, Pierre à Mazel 7, 2000 Neuchâtel, Switzerland. Institute of statistics, University of Neuchâtel, Pierre à Mazel 7, 2000 Neuchâtel, Switzerland. (Received October 2010; accepted may 2011)

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson 1 Introduction When planning the sampling strategy (i.e.

More information

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/2/03) Ed Stanek Here are comments on the Draft Manuscript. They are all suggestions that

More information

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Statistica Sinica 8(1998), 1165-1173 A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Phillip S. Kott National Agricultural Statistics Service Abstract:

More information

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

The R package sampling, a software tool for training in official statistics and survey sampling

The R package sampling, a software tool for training in official statistics and survey sampling The R package sampling, a software tool for training in official statistics and survey sampling Yves Tillé 1 and Alina Matei 2 1 Institute of Statistics, University of Neuchâtel, Switzerland yves.tille@unine.ch

More information

A new resampling method for sampling designs without replacement: the doubled half bootstrap

A new resampling method for sampling designs without replacement: the doubled half bootstrap 1 Published in Computational Statistics 29, issue 5, 1345-1363, 2014 which should be used for any reference to this work A new resampling method for sampling designs without replacement: the doubled half

More information

Supplement-Sample Integration for Prediction of Remainder for Enhanced GREG

Supplement-Sample Integration for Prediction of Remainder for Enhanced GREG Supplement-Sample Integration for Prediction of Remainder for Enhanced GREG Abstract Avinash C. Singh Division of Survey and Data Sciences American Institutes for Research, Rockville, MD 20852 asingh@air.org

More information

Probability Sampling Designs: Principles for Choice of Design and Balancing

Probability Sampling Designs: Principles for Choice of Design and Balancing Submitted to Statistical Science Probability Sampling Designs: Principles for Choice of Design and Balancing Yves Tillé, Matthieu Wilhelm University of Neuchâtel arxiv:1612.04965v1 [stat.me] 15 Dec 2016

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total. Abstract

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total.   Abstract NONLINEAR CALIBRATION 1 Alesandras Pliusas 1 Statistics Lithuania, Institute of Mathematics and Informatics, Lithuania e-mail: Pliusas@tl.mii.lt Abstract The definition of a calibrated estimator of the

More information

Optimal Auxiliary Variable Assisted Two-Phase Sampling Designs

Optimal Auxiliary Variable Assisted Two-Phase Sampling Designs MASTER S THESIS Optimal Auxiliary Variable Assisted Two-Phase Sampling Designs HENRIK IMBERG Department of Mathematical Sciences Division of Mathematical Statistics CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY

More information

RESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy.

RESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy. CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2014 www.csgb.dk RESEARCH REPORT Ina Trolle Andersen, Ute Hahn and Eva B. Vedel Jensen Vanishing auxiliary variables in PPS sampling with applications

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson Department of Statistics Stockholm University Introduction

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

Simple design-efficient calibration estimators for rejective and high-entropy sampling

Simple design-efficient calibration estimators for rejective and high-entropy sampling Biometrika (202), 99,, pp. 6 C 202 Biometrika Trust Printed in Great Britain Advance Access publication on 3 July 202 Simple design-efficient calibration estimators for rejective and high-entropy sampling

More information

Mean estimation with calibration techniques in presence of missing data

Mean estimation with calibration techniques in presence of missing data Computational Statistics & Data Analysis 50 2006 3263 3277 www.elsevier.com/locate/csda Mean estimation with calibration techniues in presence of missing data M. Rueda a,, S. Martínez b, H. Martínez c,

More information

Exact balanced random imputation for sample survey data

Exact balanced random imputation for sample survey data Exact balanced random imputation for sample survey data Guillaume Chauvet, Wilfried Do Paco To cite this version: Guillaume Chauvet, Wilfried Do Paco. Exact balanced random imputation for sample survey

More information

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting)

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) Phillip S. Kott RTI International NISS Worshop on the Analysis of Complex Survey Data With Missing Item Values October 17, 2014 1 RTI

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

Variants of the splitting method for unequal probability sampling

Variants of the splitting method for unequal probability sampling Variants of the splitting method for unequal probability sampling Lennart Bondesson 1 1 Umeå University, Sweden e-mail: Lennart.Bondesson@math.umu.se Abstract This paper is mainly a review of the splitting

More information

A Model-Over-Design Integration for Estimation from Purposive Supplements to Probability Samples

A Model-Over-Design Integration for Estimation from Purposive Supplements to Probability Samples A Model-Over-Design Integration for Estimation from Purposive Supplements to Probability Samples Avinash C. Singh, NORC at the University of Chicago, Chicago, IL 60603 singh-avi@norc.org Abstract For purposive

More information

The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing

The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing 1 The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing Greene Ch 4, Kennedy Ch. R script mod1s3 To assess the quality and appropriateness of econometric estimators, we

More information

Estimation of Some Proportion in a Clustered Population

Estimation of Some Proportion in a Clustered Population Nonlinear Analysis: Modelling and Control, 2009, Vol. 14, No. 4, 473 487 Estimation of Some Proportion in a Clustered Population D. Krapavicaitė Institute of Mathematics and Informatics Aademijos str.

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Statistica Sinica 17(2007), 1047-1064 ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Jiahua Chen and J. N. K. Rao University of British Columbia and Carleton University Abstract: Large sample properties

More information

Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators. Calibration Estimators

Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators. Calibration Estimators Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators Jill A. Dever, RTI Richard Valliant, JPSM & ISR is a trade name of Research Triangle Institute. www.rti.org

More information

Non-uniform coverage estimators for distance sampling

Non-uniform coverage estimators for distance sampling Abstract Non-uniform coverage estimators for distance sampling CREEM Technical report 2007-01 Eric Rexstad Centre for Research into Ecological and Environmental Modelling Research Unit for Wildlife Population

More information

Bias Correction in the Balanced-half-sample Method if the Number of Sampled Units in Some Strata Is Odd

Bias Correction in the Balanced-half-sample Method if the Number of Sampled Units in Some Strata Is Odd Journal of Of cial Statistics, Vol. 14, No. 2, 1998, pp. 181±188 Bias Correction in the Balanced-half-sample Method if the Number of Sampled Units in Some Strata Is Odd Ger T. Slootbee 1 The balanced-half-sample

More information

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy. 5 Sampling with Unequal Probabilities Simple random sampling and systematic sampling are schemes where every unit in the population has the same chance of being selected We will now consider unequal probability

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Melike Oguz-Alper Yves G. Berger Abstract The data used in social, behavioural, health or biological

More information

Accounting for Complex Sample Designs via Mixture Models

Accounting for Complex Sample Designs via Mixture Models Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

A JACKKNIFE VARIANCE ESTIMATOR FOR SELF-WEIGHTED TWO-STAGE SAMPLES

A JACKKNIFE VARIANCE ESTIMATOR FOR SELF-WEIGHTED TWO-STAGE SAMPLES Statistica Sinica 23 (2013), 595-613 doi:http://dx.doi.org/10.5705/ss.2011.263 A JACKKNFE VARANCE ESTMATOR FOR SELF-WEGHTED TWO-STAGE SAMPLES Emilio L. Escobar and Yves G. Berger TAM and University of

More information

Asymptotic Normality under Two-Phase Sampling Designs

Asymptotic Normality under Two-Phase Sampling Designs Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Sampling: A Brief Review. Workshop on Respondent-driven Sampling Analyst Software

Sampling: A Brief Review. Workshop on Respondent-driven Sampling Analyst Software Sampling: A Brief Review Workshop on Respondent-driven Sampling Analyst Software 201 1 Purpose To review some of the influences on estimates in design-based inference in classic survey sampling methods

More information

The Use of Survey Weights in Regression Modelling

The Use of Survey Weights in Regression Modelling The Use of Survey Weights in Regression Modelling Chris Skinner London School of Economics and Political Science (with Jae-Kwang Kim, Iowa State University) Colorado State University, June 2013 1 Weighting

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Evaluation of Variance Approximations and Estimators in Maximum Entropy Sampling with Unequal Probability and Fixed Sample Size

Evaluation of Variance Approximations and Estimators in Maximum Entropy Sampling with Unequal Probability and Fixed Sample Size Journal of Official Statistics, Vol. 21, No. 4, 2005, pp. 543 570 Evaluation of Variance Approximations and Estimators in Maximum Entropy Sampling with Unequal robability and Fixed Sample Size Alina Matei

More information

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS David A Binder and Georgia R Roberts Methodology Branch, Statistics Canada, Ottawa, ON, Canada K1A 0T6 KEY WORDS: Design-based properties, Informative sampling,

More information

Unequal Probability Designs

Unequal Probability Designs Unequal Probability Designs Department of Statistics University of British Columbia This is prepares for Stat 344, 2014 Section 7.11 and 7.12 Probability Sampling Designs: A quick review A probability

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Estimation of change in a rotation panel design

Estimation of change in a rotation panel design Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS028) p.4520 Estimation of change in a rotation panel design Andersson, Claes Statistics Sweden S-701 89 Örebro, Sweden

More information

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012 Econometric Methods Prediction / Violation of A-Assumptions Burcu Erdogan Universität Trier WS 2011/2012 (Universität Trier) Econometric Methods 30.11.2011 1 / 42 Moving on to... 1 Prediction 2 Violation

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

Contextual Effects in Modeling for Small Domains

Contextual Effects in Modeling for Small Domains University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Engineering and Information Sciences 2011 Contextual Effects in

More information

Development of methodology for the estimate of variance of annual net changes for LFS-based indicators

Development of methodology for the estimate of variance of annual net changes for LFS-based indicators Development of methodology for the estimate of variance of annual net changes for LFS-based indicators Deliverable 1 - Short document with derivation of the methodology (FINAL) Contract number: Subject:

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

Multivariate Regression: Part I

Multivariate Regression: Part I Topic 1 Multivariate Regression: Part I ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà Outline of this topic Statement of the objective: we want to explain the behavior of one variable as a

More information

A comparison of pivotal sampling and unequal. probability sampling with replacement

A comparison of pivotal sampling and unequal. probability sampling with replacement arxiv:1609.02688v2 [math.st] 13 Sep 2016 A comparison of pivotal sampling and unequal probability sampling with replacement Guillaume Chauvet 1 and Anne Ruiz-Gazen 2 1 ENSAI/IRMAR, Campus de Ker Lann,

More information

Multiple Regression Analysis: Heteroskedasticity

Multiple Regression Analysis: Heteroskedasticity Multiple Regression Analysis: Heteroskedasticity y = β 0 + β 1 x 1 + β x +... β k x k + u Read chapter 8. EE45 -Chaiyuth Punyasavatsut 1 topics 8.1 Heteroskedasticity and OLS 8. Robust estimation 8.3 Testing

More information

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Ji-Yeon Kim Iowa State University F. Jay Breidt Colorado State University Jean D. Opsomer Colorado State University

More information

Introduction to Econometrics. Heteroskedasticity

Introduction to Econometrics. Heteroskedasticity Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Asymptotics Asymptotics Multiple Linear Regression: Assumptions Assumption MLR. (Linearity in parameters) Assumption MLR. (Random Sampling from the population) We have a random

More information

Estimating Bounded Population Total Using Linear Regression in the Presence of Supporting Information

Estimating Bounded Population Total Using Linear Regression in the Presence of Supporting Information International Journal of Mathematics and Computational Science Vol. 4, No. 3, 2018, pp. 112-117 http://www.aiscience.org/journal/ijmcs ISSN: 2381-7011 (Print); ISSN: 2381-702X (Online) Estimating Bounded

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Efficient Robbins-Monro Procedure for Binary Data

Efficient Robbins-Monro Procedure for Binary Data Efficient Robbins-Monro Procedure for Binary Data V. Roshan Joseph School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205, USA roshan@isye.gatech.edu SUMMARY

More information

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA Journal of Research (Science), Bahauddin Zakariya University, Multan, Pakistan. Vol.15, No.1, June 2004, pp. 97-106 ISSN 1021-1012 APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA G. R. Pasha 1 and

More information

Taking into account sampling design in DAD. Population SAMPLING DESIGN AND DAD

Taking into account sampling design in DAD. Population SAMPLING DESIGN AND DAD Taking into account sampling design in DAD SAMPLING DESIGN AND DAD With version 4.2 and higher of DAD, the Sampling Design (SD) of the database can be specified in order to calculate the correct asymptotic

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Sampling Theory. Improvement in Variance Estimation in Simple Random Sampling

Sampling Theory. Improvement in Variance Estimation in Simple Random Sampling Communications in Statistics Theory and Methods, 36: 075 081, 007 Copyright Taylor & Francis Group, LLC ISS: 0361-096 print/153-415x online DOI: 10.1080/0361090601144046 Sampling Theory Improvement in

More information

Discussion of Sensitivity and Informativeness under Local Misspecification

Discussion of Sensitivity and Informativeness under Local Misspecification Discussion of Sensitivity and Informativeness under Local Misspecification Jinyong Hahn April 4, 2019 Jinyong Hahn () Discussion of Sensitivity and Informativeness under Local Misspecification April 4,

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-13-244R2 Title Examining some aspects of balanced sampling in surveys Manuscript ID SS-13-244R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.2013.244 Complete

More information

Design of Screening Experiments with Partial Replication

Design of Screening Experiments with Partial Replication Design of Screening Experiments with Partial Replication David J. Edwards Department of Statistical Sciences & Operations Research Virginia Commonwealth University Robert D. Leonard Department of Information

More information

A characterization of consistency of model weights given partial information in normal linear models

A characterization of consistency of model weights given partial information in normal linear models Statistics & Probability Letters ( ) A characterization of consistency of model weights given partial information in normal linear models Hubert Wong a;, Bertrand Clare b;1 a Department of Health Care

More information

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS Andrew A. Neath 1 and Joseph E. Cavanaugh 1 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, Illinois 606, USA

More information

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i 1/34 Outline Basic Econometrics in Transportation Model Specification How does one go about finding the correct model? What are the consequences of specification errors? How does one detect specification

More information

The New sampling Procedure for Unequal Probability Sampling of Sample Size 2.

The New sampling Procedure for Unequal Probability Sampling of Sample Size 2. . The New sampling Procedure for Unequal Probability Sampling of Sample Size. Introduction :- It is a well known fact that in simple random sampling, the probability selecting the unit at any given draw

More information

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract Far East J. Theo. Stat. 0() (006), 179-196 COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS Department of Statistics University of Manitoba Winnipeg, Manitoba, Canada R3T

More information

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize

More information

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore AN EMPIRICAL LIKELIHOOD BASED ESTIMATOR FOR RESPONDENT DRIVEN SAMPLED DATA Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore Mark Handcock, Department

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

Adaptive two-stage sequential double sampling

Adaptive two-stage sequential double sampling Adaptive two-stage sequential double sampling Bardia Panahbehagh Afshin Parvardeh Babak Mohammadi March 4, 208 arxiv:803.04484v [math.st] 2 Mar 208 Abstract In many surveys inexpensive auxiliary variables

More information

Topic 4: Model Specifications

Topic 4: Model Specifications Topic 4: Model Specifications Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Functional Forms 1.1 Redefining Variables Change the unit of measurement of the variables will

More information

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,

More information

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Keywords: Survey sampling, finite populations, simple random sampling, systematic

More information

Contributions to the Theory of Unequal Probability Sampling. Anders Lundquist

Contributions to the Theory of Unequal Probability Sampling. Anders Lundquist Contributions to the Theory of Unequal Probability Sampling Anders Lundquist Doctoral Dissertation Department of Mathematics and Mathematical Statistics Umeå University SE-90187 Umeå Sweden Copyright Anders

More information

HISTORICAL PERSPECTIVE OF SURVEY SAMPLING

HISTORICAL PERSPECTIVE OF SURVEY SAMPLING HISTORICAL PERSPECTIVE OF SURVEY SAMPLING A.K. Srivastava Former Joint Director, I.A.S.R.I., New Delhi -110012 1. Introduction The purpose of this article is to provide an overview of developments in sampling

More information

Lecture 9 SLR in Matrix Form

Lecture 9 SLR in Matrix Form Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.

More information

Minimax design criterion for fractional factorial designs

Minimax design criterion for fractional factorial designs Ann Inst Stat Math 205 67:673 685 DOI 0.007/s0463-04-0470-0 Minimax design criterion for fractional factorial designs Yue Yin Julie Zhou Received: 2 November 203 / Revised: 5 March 204 / Published online:

More information

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh September 13 & 15, 2005 1. Complete-case analysis (I) Complete-case analysis refers to analysis based on

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses Statistica Sinica 5(1995), 459-473 OPTIMAL DESIGNS FOR POLYNOMIAL REGRESSION WHEN THE DEGREE IS NOT KNOWN Holger Dette and William J Studden Technische Universitat Dresden and Purdue University Abstract:

More information

THE THEORY AND PRACTICE OF MAXIMAL BREWER SELECTION WITH POISSON PRN SAMPLING

THE THEORY AND PRACTICE OF MAXIMAL BREWER SELECTION WITH POISSON PRN SAMPLING THE THEORY AND PRACTICE OF MAXIMAL BREWER SELECTION WITH POISSON PRN SAMPLING Phillip S. Kott And Jeffrey T. Bailey, National Agricultural Statistics Service Phillip S. Kott, NASS, Room 305, 351 Old Lee

More information

O.O. DAWODU & A.A. Adewara Department of Statistics, University of Ilorin, Ilorin, Nigeria

O.O. DAWODU & A.A. Adewara Department of Statistics, University of Ilorin, Ilorin, Nigeria Efficiency of Alodat Sample Selection Procedure over Sen - Midzuno and Yates - Grundy Draw by Draw under Unequal Probability Sampling without Replacement Sample Size 2 O.O. DAWODU & A.A. Adewara Department

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

C. J. Skinner Cross-classified sampling: some estimation theory

C. J. Skinner Cross-classified sampling: some estimation theory C. J. Skinner Cross-classified sampling: some estimation theory Article (Accepted version) (Refereed) Original citation: Skinner, C. J. (205) Cross-classified sampling: some estimation theory. Statistics

More information