BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING
|
|
- Spencer Hart
- 5 years ago
- Views:
Transcription
1 Statistica Sinica 22 (2012), doi: BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Desislava Nedyalova and Yves Tillé University of Neuchâtel Abstract: In model-based inference, the selection of balanced samples has been considered to give protection against misspecification of the model. A recent development in finite population sampling is that balanced samples can be randomly selected. There are several possible strategies that use balanced samples. We give a definition of balanced sample that embodies overbalanced, mean-balanced, and π-balanced samples, and we derive strategies in order to equalize a d-weighted estimator with the best linear unbiased estimator. We show the value of selecting a balanced sample with inclusion probabilities proportional to the standard deviations of the errors with the Horvitz-Thompson estimator. This is a strategy that is design-robust and efficient. We show its superiority compared to other strategies that use balanced samples in the model-based framewor. In particular, we show that this strategy is preferable to the use of overbalanced samples in the polynomial model. The problem of bias-robustness is also discussed, and we show how overspecifying the model can protect against misspecification. Key words and phrases: Balanced sampling, finite population sampling, polynomial model, ratio model, robust estimation. 1. Introduction The principal difference between the model-based and the classical designbased approach for estimating finite population totals lies in the source of randomness they use (Särndal (1978)). In design-based sampling, the inference is based on the stochastic structure induced by the sampling design. In the modelbased, or prediction approach, the inference depends on the validity of the model used to describe the data. In this case, the randomness is due to the population model and not to the sampling design. The model-based approach was developed, among others, by Royall (1976, 1992), Royall and Cumberland (1981), and Chambers (1996). When the data are assumed to follow a linear model, Royall (1976) proposed the use of the best linear unbiased predictor. The model-based approach has been criticized due to the fact that it may lead to severe bias if the model assumptions are violated. In contrast to model-based inference, design-based inference is design-robust by
2 778 DESISLAVA NEDYALKOVA AND YVES TILLÉ definition. Brewer and Särndal (1983) point out that, since the inference is not based on a model, there is no need to worry about model misspecification. Much of the wor in model-based research has been devoted to the construction of robust strategies. More specifically, in order to protect the inference against a misspecified model, Royall and Herson (1973a,b) and Scott, Brewer, and Ho (1978) point out the importance of balanced samples, where balance is achieved by equalizing the sample moments of the independent variables with those in the population. They came to the conclusion that the sample must be balanced, but not necessarily random. Another way to accomplish design-robustness in the model-based approach is to choose an appropriate sampling design. Since Deville and Tillé (2004) s paper, it is now possible to randomly select balanced samples using a procedure called the cube method. Nedyalova and Tillé (2008) have shown that under a random balanced sampling design, with inclusion probabilities proportional to the standard deviations of the errors of the model, and under certain conditions defined as fully explainable heteroscedasticity, the best linear unbiased estimator is the Horvitz-Thompson estimator. This is an optimal strategy that reconciles the two approaches. Scott, Brewer, and Ho (1978) and Royall and Herson (1973a) recommend the use of balanced samples in order to protect against a misspecification of the model, while Nedyalova and Tillé (2008) recommend using balanced sampling for minimizing the anticipated variance under a linear model. An important particular case is the polynomial model. Scott, Brewer, and Ho (1978) suggest using overbalanced samples with an ad-hoc weighting system. We investigate the different strategies leading to design-robust estimations under the model-based framewor, and consider the polynomial model, overbalancing and robustness under misspecification in light of Nedyalova and Tillé (2008). The notation and definitions are given in Section 2. The model-based framewor is briefly introduced in Section 3. In Section 4, we consider a large class of balanced designs, called d-balanced designs, that embody balanced, π- balanced, mean-balanced, and overbalanced samples. We also consider a class of weighted estimators, called d-weighted estimators, and discuss several strategies for which the Best Linear Unbiased (BLU) estimator is the d-estimator. These results generalize some results of Nedyalova and Tillé (2008) for the Horvitz- Thompson estimator. In Section 5, we give a definition of design-robustness and show that an appropriate strategy to protect against misspecification of the model consists of overspecifying the model and then balancing on the independent variables of the model. In Section 6, we revisit the polynomial model to show that the overbalancing strategy of Scott, Brewer, and Ho (1978) is suboptimal, and to give an alternative strategy that minimizes the anticipated variance.
3 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 779 In Section 7, we draw some general conclusions on selecting a sample when one reasonably believes in a linear model but sees protection against misspecification. 2. Notation and Definitions Consider a population U of size N. Each unit of the population is identified by a label = 1,..., N. Suppose that a register is available, and that the values of p auxiliary variables are nown for each unit of the population. Let y be the value taen by the variable of interest y on the th unit of the population. The values y are unnown. We are interested in estimating the population total Y = y. The total Y is estimated by a sample s of size n, where s is a subset of U. A sample is only a subset of the population and is not necessarily randomly selected. A sampling design is a tool to randomly select a sample. It is defined by assigning to each sample s a probability p(s) of being selected. Let S denote the random sample such that Pr(S = s) = p(s). The inclusion probability π is then the probability that unit is selected in the sample. We denote by E p ( ) and var p ( ), respectively, the expectation and variance under the sampling design p( ), and by S the set of units of the population which are not in S. Definition 1. An estimator Ŷ is design-unbiased if E p(ŷ ) = Y. Definition 2. A sample s is d-balanced on a set of variables x = (x 1 x p ) if and only if x, d x = s where d 1,..., d,..., d N is a set of weights that do not depend on the sample s. The weights d 1,..., d,..., d N are positive values. An important example is given by the d = 1/π used in the Horvitz-Thompson estimator. When the sample is randomly selected, an inclusion probability can be assigned to each statistical unit. If π > 0, for all U, the Horvitz-Thompson estimator of Y, Ŷ π = y, π is design-unbiased, i.e. E p (Ŷπ) = Y. For a balanced sample, the inclusion probabilities are π = 1/d, and the procedure that randomly selects a balanced sample is called a balanced sampling design. According to Deville and Tillé (2004), a sampling design p( ) is balanced on the auxiliary variables x 1,..., x p if and only if x = x. (2.1) π
4 780 DESISLAVA NEDYALKOVA AND YVES TILLÉ Authors such as Cumberland and Royall (1981) and Kott (1986) would call this a π-balanced sampling, as opposed to a mean-balanced sampling defined by the equation 1 x = 1 x. n N We use the expression balanced sampling to denote a sampling design that satisfies (2.1) for one or more auxiliary variables, a mean-balanced sampling being a particular case of this balanced sampling when the sample is selected with inclusion probabilities n/n. If the population size is small, a balanced sampling design can be implemented by a linear program. For larger population sizes, the cube method may be used (see Deville and Tillé (2004) or Tillé (2006)). Whatever the algorithm used for selecting a balanced sample, an exact balanced sample cannot generally be found because it does not exist. It is however, always possible to select a sample that is almost balanced. We assume that the balancing error can be neglected. The selection of a balanced sample requires the use of a register that contains the values of the auxiliary variables for each unit of the population. 3. Model-based Strategy and BLU Estimator We assume that the population follows a linear model M, y = x β + ε, (3.1) where β is the vector of regression coefficients and ε is a vector of random variables ε such that E M (ε ) = 0, var M (ε ) = σ 2 ν 2, cov M(ε, ε l ) = 0 if l, Suppose ν, U, are nown. For simplicity, we scale them so that ν = N. Model (3.1) includes the possibility of heteroscedasticity. The heteroscedasticity of ν or ν 2 may or may not be proportional to some auxiliary variables included in x. Nedyalova and Tillé (2008) have shown the importance of using auxiliary variables whose linear combination is equal to ν or ν 2. Indeed, in this case, the optimal strategies for the model-based and design-based framewors are the same. An important and common hypothesis is that the random sample S and the errors ε of the model are independent. The symbols E M ( ), var M ( ), cov M ( ) denote, respectively, the expected value, the variance, and the covariance under M. Definition 3. An estimator Ŷ is model-unbiased if E M(Ŷ Y ) = 0.
5 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 781 Definition 4. The model mean squared error of an estimator Ŷ is MSE M(Ŷ ) = E M (Ŷ Y ) 2. When Ŷ is model-unbiased, the model mean squared error is also called the error variance, for instance in Royall and Cumberland (1981). Definition 5 (Isai and Fuller (1982)). The anticipated mean squared error of an estimator Ŷ is MSE pm(ŷ ) = E pe M (Ŷ Y )2. When Ŷ is design-unbiased, the anticipated mean squared error is also called the anticipated variance. Royall (1976) showed that, in the framewor of model-based inference, the Best Linear Unbiased (BLU) estimator is Ŷ BLU = = y + S x β BLU = x β BLU + (y x β BLU ) x β BLU + e, (3.2) e = y x β BLU, (3.3) where β BLU = A 1 x y ν 2, A = x x ν 2. The error variance of the best linear unbiased estimator is E M (ŶBLU Y ) 2 = σ 2 x A 1 x l + ν 2. S l S S We refer to the usual definition given, for instance, in Háje (1959, 1981), Ramarishnan (1975), and Joshi (1979). Definition 6. A strategy is a pair (p( ), Ŷ ) consisting of a sampling design and an estimator. Strategy 1. Use the best linear unbiased estimator and choose a sample of size n that minimizes E M (ŶBLU Y ) 2. This is an optimal, purely unbiased strategy that is not design-robust because, in some cases, it can lead to the choice of a very extreme sample, as in the
6 782 DESISLAVA NEDYALKOVA AND YVES TILLÉ following example. Suppose the model has no intercept and only one regressor (see for instance, Royall and Herson (1973a)): y = x β + ε, (3.4) with var M (ε ) = σ 2 ν 2 and ν2 x. Then the BLU estimator is Ŷ BLU = x y, x which, in this case, is the ordinary ratio estimator ŶR. As ν 2 x and ν = N, we have that ν 2 = N 2 x ) 2. x ( The error variance of ŶR is given by the expression ( E M (ŶR Y ) 2 = σ 2 N 2 ) 2 S x ) x. x ( x Thus, in this case, the optimal purely model-based strategy consists of choosing the units with the n largest values of variable x (Royall (1970)), a very extreme sample. The strategy can be dangerous if the model is wrong. It is thus reasonable to opt for a strategy that guarantees correct estimation when the model is misspecified, and that leads to design-unbiased inference. For these reasons, Strategy 1 is rarely used (see Hansen, Madow, and Tepping (1983)). 4. Balanced Sample under a Linear Model In this section we generalize the concept of balanced samples. Our results are thus more general than those in Nedyalova and Tillé (2008). Consider the d-weighted estimator Ŷ d = d y = d x β BLU + d e, (4.1) where e is defined in (3.3). Under d-balanced sampling, Ŷd is model-unbiased and its error variance is E M (Ŷd Y ) 2 = σ 2 (d 1) 2 ν 2 + ν 2. (4.2) S By comparing (4.1) with (3.2), we generalize Result 7 of Nedyalova and Tillé (2008).
7 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 783 Result 1. A sufficient condition that ŶBLU = Ŷd is that the sampling design is d-balanced on x, e (d 1) = 0. A particular case of Result 1 is given below. Corollary 1. A sufficient condition that ŶBLU = Ŷd is that the sampling design is d-balanced on x, there exists a vector λ such that λ x = ν 2 (d 1), for all U. Proof. If then λ x ν 2 (d 1) = 1, λ x e (d 1) = ν 2 (d 1) e (d 1) = Consider a strategy that meets the conditions of Result 1. Strategy 2. λ x ν 2 e = 0. Use a d-balanced sampling design on x, with d chosen so that d = (ν 2 + λ x )/ν 2, for all U, use the d-weighted estimator. A particular case of Strategy 2 was recommended by Scott, Brewer, and Ho (1978) in the case of a polynomial model. Since the conditions of Result 1 are met, we have that ŶBLU = Ŷd. The strategy judiciously chooses the d s to equalize the d-weighted estimator and the BLU estimator. For a sample size n, this strategy can be implemented by using the inclusion probabilities ν 2 π = ν 2 +. λ x The value of λ can be chosen freely subject to π = After some algebra, it is possible to show that E M (Ŷd Y ) 2 = σ 2 S ν 2 ν 2 + λ x = n. d ν 2 = σ2 ( S ν 2 + S λ x ).
8 784 DESISLAVA NEDYALKOVA AND YVES TILLÉ Moreover, E p E M (Ŷd Y ) 2 = σ 2 λ x. Nevertheless, we see below that this is not necessarily the best strategy. Eventually, consider a strategy proposed by Nedyalova and Tillé (2008) that also meets the conditions of Result 1. Strategy 3. Use a d-balanced sampling design on x, with d ν 1 where x is complemented so that there exist two vectors α and γ such that α x = ν 2 and γ x = ν, for all U, a fully explainable heteroscedasticity, use the d-weighted estimator. Strategy 3 was recommended by Nedyalova and Tillé (2008) because it minimizes the anticipated variance among the strategies that are design-unbiased (see also Fuller (2009, p.187)). It is thus better than Strategy in the class of design-unbiased strategies. With Strategy 3, we have ŶBLU = Ŷd. The condition of fully explainable heteroscedasticity can be obtained by adding ν 2 and d ν 2 to the list of balancing variables. Thus, ν 2, d ν 2, d ν 2 = d 2 ν2 = and the error variance of the d-weighted estimator given in (4.2) simplifies to E M (Ŷd Y ) 2 = σ 2 (d 1)ν 2. For a sample size n, we must tae d = N/(ν n), and π = 1/d. The error variance of the d-weighted estimator, given in (4.2), simplifies to E M (Ŷd Y ) 2 = σ ( ) 2 Nν (N n ν2 = σ 2 2 n ) ν Bias-robustness in Submodels A large part of the model-based inference is dedicated to the bias-robustness of the BLU estimator in the case of misspecification of the model. In this section, we propose a formalization of the question of bias-robustness and an analysis of the consequences of model misspecification. We assume that a model M is used to conceive the strategy, but that the true underlying model is M.
9 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 785 Definition 7. A strategy is bias-robust for a model M if E M (Ŷ Y ) = 0. Consider a model and an alternative model M : y = x β + ε M : y = z γ + η, where E M (η ) = 0. There is no assumption on the covariance matrix of the vector of η. Definition 8. Model M is a submodel of M if there exists a matrix A such that Ax = z, not necessarily of full ran. The basic fact needed to show that a strategy is robust is the following. Result 2. The strategy that consists of using a d-balanced sampling design on x (with any vector of d ) and the d-weighted estimator is bias-robust for any submodel of M. Proof. Under model M, and using a d-balanced sample, Ŷ d = d y = d (z γ + η ) = d (A x γ + η ) = A x γ + d η. Thus, E M (Ŷd Y ) = E M ( d η η ) = 0. Result 2 encourages the statistician to overspecify the model, i.e., to introduce additional variables into model M in order to ensure that M is really a submodel of M. Indeed, if model M* is true, the variance under the model of the d-weighted estimator is the same irrespective of whether the sample is balanced on x or on z. An over-specification of the model does not increase the variance under the model of the estimator because the independent variables of the model are only used for balancing the sample. Moreover, the d-weighted estimator does not depend on an estimated coefficient that relies on the number of auxiliary variables. Suppose now that the model is misspecified, i.e., that model M is not a submodel of M. If the sampling design is d-balanced on the independent variables of model M, then ( ) E M (Ŷd Y ) = γ. d z z
10 786 DESISLAVA NEDYALKOVA AND YVES TILLÉ Let f = z γ x ϕ be the residuals of a linear regression of (z γ) on x and ϕ the regression coefficient vector. Then, if the sampling design is d-balanced on x, [ E M (Ŷd Y ) = d (x ϕ + f ) ] (x ϕ + f ) = d f f, (5.1) for any value ϕ, in particular when ϕ = ( x x var(η ) ) 1 x z γ var(η ). The model variance thus only depends on the residuals of the regression, i.e., on the part of the model that remains misspecified. If we do not possess information about the z, we select a random sample with inclusion probabilities π = 1/d, because in selecting the sample randomly, the expected value under the sampling design of (5.1) is zero. Moreover, we can expect that, under reasonable asymptotic assumptions, the quantity E M (Ŷd Y ) N = 1 N ( f ) f, π remains bounded in probability with respect to the sampling design when multiplied by n. The random selection of a sample and the use of a design-unbiased estimator give an ultimate bias protection in the case where it is not possible to overspecify the model. This guarantees a negligible bias under M when n is large. 6. Application to the Polynomial Model 6.1. Presentation of the model The polynomial model was studied, among others, by Royall and Herson (1973a), Scott, Brewer, and Ho (1978), and Valliant, Dorfman, and Royall (2000). The model is J y = δ j β j x j + ε, (6.1) j=0 where x is the only independent variable, β j is the jth regression coefficient, δ j is equal to 1 or 0 as the term β j x appears or not in the regression, E M(ε ) = 0, var M (ε ) = σ 2 ν 2, and cov M(ε, ε l ) = 0, when l. We assume ν = N.
11 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 787 Now, from Result 2, for any set of vectors of d, the d-weighted estimator is bias-robust under any submodel of (6.1) provided d x j = This implies several results on the polynomial model A first suboptimal strategy Let S (J) be a particular sample for which S x j = x S x j, for j = 0,..., J. (6.2) x j+1 /ν 2 x 2 /ν2, for j = 0,..., J. (6.3) With a sample satisfying (6.3), Scott, Brewer, and Ho (1978) showed that the estimator Ŷ 0 = y + x y x /ν 2, S x2 /ν2 is BLU for any polynomial Model (6.1), and any value of ν. This simple condition on the sample implies that the estimator is bias-robust for a large class of polynomial models. It can easily be shown that a sufficient condition for a sample to satisfy (6.3) is ( x j 1 + λx ) ν 2 = x j, for j = 0,..., J, (6.4) where λ is a scalar that does not depend on j. Equality (6.4) can be satisfied by using Strategy 2. In this particular case, we have π = 1/d and d = 1 + λx /ν 2. Strategy 2. (for polynomial model) Select a balanced sample such that with the unequal inclusion probabilities x j = x j π, j = 0,..., J, π = λx /ν 2, U.
12 788 DESISLAVA NEDYALKOVA AND YVES TILLÉ The constant λ is chosen as a function of the desired sample size, where π = λx /ν 2 = n. (6.5) The solution of (6.5) in λ is unique. Strategy 2 is recommended by Scott, Brewer, and Ho (1978). The inclusion probabilities are, however, chosen so that the BLU estimator is equal to the Horvitz-Thompson estimator and is thus not optimized. Two particular cases of (6.3) are as follows. (a) When ν 2 x, under the condition ν = N, In this case, (6.3) reduces to S xj S x ν 2 = = ( xj x N 2 x ) 2. x Thus, the sample should satisfy the condition 1 n x j = 1 N n x j = 1 N S, for j = 0,..., J. x j, for j = 0,..., J. Royall and Herson (1973a) call such samples balanced. In this case, Ŷ0 reduces to the ordinary ratio estimator Ŷ R = x y x. (b) When ν 2 x2, it is easily shown that ν2 = N 2 x 2 / ( x ) 2. In this case, (6.3) reduces to x j 1 n = S xj S x, for j = 0,..., J. The sample S (J) is called overbalanced (Scott, Brewer, and Ho (1978)), and Ŷ 0 reduces to Ŷ OB = ( ) 1 y y + x. n x S
13 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE An alternative strategy for the polynomial model Strategy 3. (for polynomial model) Use inclusion probabilities that are proportional to ν, subject to π = n, 0 π 1. Select a balanced sample according to the balancing equations Use the Horvitz-Thompson estimator. x j = x j π, j = 0,..., J, (6.6) ν 2 = ν 2 π, ν = ν. (6.7) π Note that, since π nν /N, (6.7) becomes N n 1 = N and only means that the sample size must be fixed. Nedyalova and Tillé (2008) proved that this strategy minimizes the anticipated variances in the class of design-unbiased strategies. Strategy 3 is thus better than Strategy 2 in the sense that its anticipated variance is always smaller. With Strategy 3, the inclusion probabilities are chosen to minimize the variance, while with Strategy 2 the inclusion probabilities are chosen to meet the technical condition given in Result 1. Two particular cases are as follows. (a) When ν 2 x with ν = N, then ν 2 = ( N 2 x ) 2. x As π ν with π = n, it follows that π = (n/n)ν. In this case, if the sample has a fixed sample size and is balanced on x, it is automatically balanced on ν and ν 2. (b) When ν 2 x2, with ν = N, then ν 2 = N 2 x 2 ( x ) 2. Here too, we have π = (n/n)ν. In this case, if the sample has a fixed sample size and is balanced on x 2, it is automatically balanced on ν and ν 2.
14 790 DESISLAVA NEDYALKOVA AND YVES TILLÉ 6.4. A particular case: the ratio model Consider again the model without intercept and only one regressor with ν 2 x, as at (3.4), a particular case of the polynomial model. We have seen that the BLU estimator under this model is the ordinary ratio estimator Ŷ R = x y x, with error variance ( E M (ŶR Y ) 2 = σ 2 N 2 ) 2 S x ) x. x ( x Here Strategy 2 is endorsed by Royall and Herson (1973a). Strategy 2. (for the ratio model) Select a mean-balanced sample of size n and use the ratio estimator. With a mean-balanced sample on x, satisfying the condition 1 n x = 1 N n x, S the ratio estimator reduces to the sample mean Ŷ R = x y x = 1 y. n Moreover, E M (ŶR Y ) 2 = σ 2 N 2 (N n) n ( x x ) 2 = E p E M (ŶR Y ) 2. Strategy 3. (for the ratio model) Select a balanced sample on x with inclusion probabilities π ν x and use the Horvitz-Thompson estimator. In order to show that Strategy 3 is better than Strategy 2, we compare the anticipated and model-variances, E p E M (Ŷ Y )2 and E M (Ŷ Y )2, of the ratio and Horvitz-Thompson estimators under (3.4). Nedyalova and Tillé (2008) have shown that, under Strategy 3, ( E M (Ŷπ Y ) 2 = E p E M (Ŷπ Y ) 2 = σ 2 N 2 n ) ν 2.
15 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 791 After replacing ν with N x / x, we obtain [ E p E M (Ŷπ Y ) 2 = σ 2 N 2 n N 2 x ] ( ) 2. x With D = E p E M (ŶR Y ) 2 E p E M (Ŷπ Y ) 2 = E M (ŶR Y ) 2 E M (Ŷπ Y ) 2, simplification yields D = N 2 n [ N x ( x ) 2 1 Thus, Strategy 3 is better than Strategy 2 under (3.4). 7. Discussion ] 0. Under a linear model, the use of a purely model-based strategy (Strategy 1) can be dangerous. Balanced samples offer good protection against model misspecification. A d-balanced sampling design with the d-weighted estimator is a bias-robust strategy that assures protection against misspecification of the model. There however exist several ways of selecting balanced samples. The d-weighted estimator can be equivalent to the BLU estimator if some technical conditions are met. These conditions can be met by either choosing the ad hoc inclusion probabilities (Strategy 2) or by adding ν and ν 2 to the list of balancing variables and choosing the optimal inclusion probability (Strategy 3). For the polynomial model, Royall and Herson (1973a) and Scott, Brewer, and Ho (1978) used ad hoc inclusion probabilities. They showed that, in this case, an unweighted ratio estimator is BLU for a large class of polynomial models. Nevertheless, this strategy consists of choosing the inclusion probabilities in such a way that a technical property is satisfied. Strategy 3 is more appropriate, because the technical condition is met by adding two balancing variables and the inclusion probabilities can be chosen to minimize the anticipated variance. Strategy 2 is thus not admissible in the sense that it is always possible to obtain a smaller anticipated variance by selecting the units with inclusion probabilities proportional to the standard deviations of the errors of the model and using the Horvitz-Thompson estimator. Strategy 3 has the advantage of providing a bias-robust estimator, even if the model is misspecified. Indeed, in both cases, the estimator does not depend on the auxiliary variables and is always design-unbiased. In case of misspecification of the model or of measurement errors the estimators of totals thus remain unbiased but a part of the efficiency can
16 792 DESISLAVA NEDYALKOVA AND YVES TILLÉ be lost. The importance of this loss depends on the way in which the correlation between the independent variables of the model and the variables of interest is hindered by the measurement errors. Eventually, the best protection against a misspecification of the model consists of extending the list of balancing variables. Indeed, the addition of balancing variables that could be correlated to the variables of interest does not increase the model variance, but protects against bias under the model. If the model cannot really be specified, the ultimate protection against misspecification is always the random selection of the sample. We hope that these results give a general guideline on the way of planning a sample survey under a realistic linear model. In practice, designing a survey is often a more complex exercise for the following reasons: It is difficult to specify a model without nowing the variable(s) of interest. The sampling design is necessarily conceived before nowing the dependent variable. Thus the validation of the model is difficult. A survey is generally done for multiple purposes. Arbitration must thus be done between the variables of interest, the areas of interest, and the parameters to estimate, and thus between the models. Nonrepsonse implies that the set of respondents is never exactly balanced. Our results however give a general and ideal framewor on how to plan and estimate a total under a linear model. The result of Neyman (1934) concerning optimal stratification suffers the same problems. Optimality depends on the variable of interest, on the parameters to estimate, on the areas of interest. The variances within the strata are never really nown, thus they must be estimated using another survey, or must be derived from a hypothesis of the existence of a size effect. Indeed, optimal stratification is a particular case of our result. The generalization of our results to cluster sampling, two-stage, and two-phase sampling remains a challenging topic of research. Acnowledgement The authors than two anonymous reviewers and the Editor for their constructive comments that helped us improve the quality of this paper. This research is supported by the Swiss National Science Foundation (grant no. FN /1). References Brewer, K. and Särndal, C.-E. (1983). Six approaches to enumerative survey sampling. In Incomplete Data in Sample Surveys (Edited by W. G. Madow, I. Olin, and D. B. Rubin), Academic Press, N.Y.
17 BIAS-ROBUSTNESS AND EFFICIENCY IN MODEL-BASED INFERENCE 793 Chambers, R. L. (1996). Robust case-weighting for multipurpose establishment surveys. J. Off. Statist. 12, Cumberland, W. and Royall, R. (1981). Prediction models in unequal probability sampling. J. Roy. Statist. Soc. Ser. B 43, Deville, J.-C. and Tillé, Y. (2004). Efficient balanced sampling: The cube method. Biometria 91, Fuller, W. A. (2009). Sampling Statistics. John Wiley, New Jersey. Háje, J. (1959). Optimum strategy and other problems in probability sampling. Cosopis. Pest. Mat. 84, Háje, J. (1981). Sampling from a Finite Population. Marcel Deer, New Yor. Hansen, M. H., Madow, W. G. and Tepping, B. J. (1983). An Evaluation of model-dependent and probability-sampling inferences in sample surveys. J. Amer. Statist. Assoc. 78, Isai, C. and Fuller, W. A. (1982). Survey design under a regression population model. J. Amer. Statist. Assoc. 77, Joshi, V. M. (1979). The best strategy for estimating the mean of a finite population. Ann. Statist. 7, Kott, P. S. (1986). When a mean-of-ratios is the best linear unbiased estimator under a model. Amer. Statist. 40, Neyman, J. (1934). On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection, J. Roy. Statist. Soc. 97, Nedyalova, D. and Tillé, Y. (2008). Optimal sampling and estimation strategies under the linear model. Biometria 95, Ramarishnan, M. (1975). Choice of an optimum sampling strategy. Ann. Statist. 3, Royall, R. (1970). On finite population sampling theory under certain linear regression models. Biometria 57, Royall, R. (1976). The linear least squares prediction approach to two-stage sampling. J. Amer. Statist. Assoc. 71, Royall, R. (1992). Robustness and optimal design under prediction models for finite populations. Surv. Methodol. 18, Royall, R. and Cumberland, W. (1981). The finite population linear regression estimator and estimators of its variance. An empirical study. J. Amer. Stat. Assoc. 76, Royall, R. and Herson, J. (1973a). Robust estimation in finite populations I. J. Amer. Statist. Assoc. 68, Royall, R. and Herson, J. (1973b). Robust estimation in finite populations II: Stratification on a size variable. J. Amer. Statist. Assoc. 68, Särndal, C.-E. (1978). Design-based and model-based inference in survey sampling. Scand. J. Statist. 5, Scott, A., Brewer, K. and Ho, E. (1978). Finite population sampling and robust estimation. J. Amer. Statist. Assoc. 73, Tillé, Y. (2006). Sampling Algorithms. Springer-Verlag, New Yor. Valliant, R., Dorfman, A. and Royall, R. (2000). Finite Population Sampling and Inference: A Prediction Approach. Wiley, New Yor.
18 794 DESISLAVA NEDYALKOVA AND YVES TILLÉ Institute of statistics, University of Neuchâtel, Pierre à Mazel 7, 2000 Neuchâtel, Switzerland. Institute of statistics, University of Neuchâtel, Pierre à Mazel 7, 2000 Neuchâtel, Switzerland. (Received October 2010; accepted may 2011)
INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING
Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy
More informationA comparison of stratified simple random sampling and sampling with probability proportional to size
A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson 1 Introduction When planning the sampling strategy (i.e.
More informationComments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek
Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/2/03) Ed Stanek Here are comments on the Draft Manuscript. They are all suggestions that
More informationA MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR
Statistica Sinica 8(1998), 1165-1173 A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Phillip S. Kott National Agricultural Statistics Service Abstract:
More informationConservative variance estimation for sampling designs with zero pairwise inclusion probabilities
Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance
More informationFinite Population Sampling and Inference
Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane
More informationThe R package sampling, a software tool for training in official statistics and survey sampling
The R package sampling, a software tool for training in official statistics and survey sampling Yves Tillé 1 and Alina Matei 2 1 Institute of Statistics, University of Neuchâtel, Switzerland yves.tille@unine.ch
More informationA new resampling method for sampling designs without replacement: the doubled half bootstrap
1 Published in Computational Statistics 29, issue 5, 1345-1363, 2014 which should be used for any reference to this work A new resampling method for sampling designs without replacement: the doubled half
More informationSupplement-Sample Integration for Prediction of Remainder for Enhanced GREG
Supplement-Sample Integration for Prediction of Remainder for Enhanced GREG Abstract Avinash C. Singh Division of Survey and Data Sciences American Institutes for Research, Rockville, MD 20852 asingh@air.org
More informationProbability Sampling Designs: Principles for Choice of Design and Balancing
Submitted to Statistical Science Probability Sampling Designs: Principles for Choice of Design and Balancing Yves Tillé, Matthieu Wilhelm University of Neuchâtel arxiv:1612.04965v1 [stat.me] 15 Dec 2016
More informationREPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES
Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationNONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total. Abstract
NONLINEAR CALIBRATION 1 Alesandras Pliusas 1 Statistics Lithuania, Institute of Mathematics and Informatics, Lithuania e-mail: Pliusas@tl.mii.lt Abstract The definition of a calibrated estimator of the
More informationOptimal Auxiliary Variable Assisted Two-Phase Sampling Designs
MASTER S THESIS Optimal Auxiliary Variable Assisted Two-Phase Sampling Designs HENRIK IMBERG Department of Mathematical Sciences Division of Mathematical Statistics CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY
More informationRESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy.
CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2014 www.csgb.dk RESEARCH REPORT Ina Trolle Andersen, Ute Hahn and Eva B. Vedel Jensen Vanishing auxiliary variables in PPS sampling with applications
More informationarxiv: v2 [math.st] 20 Jun 2014
A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun
More informationA comparison of stratified simple random sampling and sampling with probability proportional to size
A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson Department of Statistics Stockholm University Introduction
More informationCombining data from two independent surveys: model-assisted approach
Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,
More informationSimple design-efficient calibration estimators for rejective and high-entropy sampling
Biometrika (202), 99,, pp. 6 C 202 Biometrika Trust Printed in Great Britain Advance Access publication on 3 July 202 Simple design-efficient calibration estimators for rejective and high-entropy sampling
More informationMean estimation with calibration techniques in presence of missing data
Computational Statistics & Data Analysis 50 2006 3263 3277 www.elsevier.com/locate/csda Mean estimation with calibration techniues in presence of missing data M. Rueda a,, S. Martínez b, H. Martínez c,
More informationExact balanced random imputation for sample survey data
Exact balanced random imputation for sample survey data Guillaume Chauvet, Wilfried Do Paco To cite this version: Guillaume Chauvet, Wilfried Do Paco. Exact balanced random imputation for sample survey
More informationIn Praise of the Listwise-Deletion Method (Perhaps with Reweighting)
In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) Phillip S. Kott RTI International NISS Worshop on the Analysis of Complex Survey Data With Missing Item Values October 17, 2014 1 RTI
More informationstatistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:
Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility
More informationVariants of the splitting method for unequal probability sampling
Variants of the splitting method for unequal probability sampling Lennart Bondesson 1 1 Umeå University, Sweden e-mail: Lennart.Bondesson@math.umu.se Abstract This paper is mainly a review of the splitting
More informationA Model-Over-Design Integration for Estimation from Purposive Supplements to Probability Samples
A Model-Over-Design Integration for Estimation from Purposive Supplements to Probability Samples Avinash C. Singh, NORC at the University of Chicago, Chicago, IL 60603 singh-avi@norc.org Abstract For purposive
More informationThe Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing
1 The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing Greene Ch 4, Kennedy Ch. R script mod1s3 To assess the quality and appropriateness of econometric estimators, we
More informationEstimation of Some Proportion in a Clustered Population
Nonlinear Analysis: Modelling and Control, 2009, Vol. 14, No. 4, 473 487 Estimation of Some Proportion in a Clustered Population D. Krapavicaitė Institute of Mathematics and Informatics Aademijos str.
More informationWeighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai
Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving
More informationASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS
Statistica Sinica 17(2007), 1047-1064 ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Jiahua Chen and J. N. K. Rao University of British Columbia and Carleton University Abstract: Large sample properties
More informationImplications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators. Calibration Estimators
Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators Jill A. Dever, RTI Richard Valliant, JPSM & ISR is a trade name of Research Triangle Institute. www.rti.org
More informationNon-uniform coverage estimators for distance sampling
Abstract Non-uniform coverage estimators for distance sampling CREEM Technical report 2007-01 Eric Rexstad Centre for Research into Ecological and Environmental Modelling Research Unit for Wildlife Population
More informationBias Correction in the Balanced-half-sample Method if the Number of Sampled Units in Some Strata Is Odd
Journal of Of cial Statistics, Vol. 14, No. 2, 1998, pp. 181±188 Bias Correction in the Balanced-half-sample Method if the Number of Sampled Units in Some Strata Is Odd Ger T. Slootbee 1 The balanced-half-sample
More informationof being selected and varying such probability across strata under optimal allocation leads to increased accuracy.
5 Sampling with Unequal Probabilities Simple random sampling and systematic sampling are schemes where every unit in the population has the same chance of being selected We will now consider unequal probability
More informationCausal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions
Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census
More informationEmpirical likelihood inference for regression parameters when modelling hierarchical complex survey data
Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Melike Oguz-Alper Yves G. Berger Abstract The data used in social, behavioural, health or biological
More informationAccounting for Complex Sample Designs via Mixture Models
Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More informationA JACKKNIFE VARIANCE ESTIMATOR FOR SELF-WEIGHTED TWO-STAGE SAMPLES
Statistica Sinica 23 (2013), 595-613 doi:http://dx.doi.org/10.5705/ss.2011.263 A JACKKNFE VARANCE ESTMATOR FOR SELF-WEGHTED TWO-STAGE SAMPLES Emilio L. Escobar and Yves G. Berger TAM and University of
More informationAsymptotic Normality under Two-Phase Sampling Designs
Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More informationREPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY
REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationSampling: A Brief Review. Workshop on Respondent-driven Sampling Analyst Software
Sampling: A Brief Review Workshop on Respondent-driven Sampling Analyst Software 201 1 Purpose To review some of the influences on estimates in design-based inference in classic survey sampling methods
More informationThe Use of Survey Weights in Regression Modelling
The Use of Survey Weights in Regression Modelling Chris Skinner London School of Economics and Political Science (with Jae-Kwang Kim, Iowa State University) Colorado State University, June 2013 1 Weighting
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More informationPropensity Score Weighting with Multilevel Data
Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative
More informationEvaluation of Variance Approximations and Estimators in Maximum Entropy Sampling with Unequal Probability and Fixed Sample Size
Journal of Official Statistics, Vol. 21, No. 4, 2005, pp. 543 570 Evaluation of Variance Approximations and Estimators in Maximum Entropy Sampling with Unequal robability and Fixed Sample Size Alina Matei
More informationSTATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS
STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS David A Binder and Georgia R Roberts Methodology Branch, Statistics Canada, Ottawa, ON, Canada K1A 0T6 KEY WORDS: Design-based properties, Informative sampling,
More informationUnequal Probability Designs
Unequal Probability Designs Department of Statistics University of British Columbia This is prepares for Stat 344, 2014 Section 7.11 and 7.12 Probability Sampling Designs: A quick review A probability
More informationModel Assisted Survey Sampling
Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling
More informationBootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator
Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos
More informationEstimation of change in a rotation panel design
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS028) p.4520 Estimation of change in a rotation panel design Andersson, Claes Statistics Sweden S-701 89 Örebro, Sweden
More informationEconometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012
Econometric Methods Prediction / Violation of A-Assumptions Burcu Erdogan Universität Trier WS 2011/2012 (Universität Trier) Econometric Methods 30.11.2011 1 / 42 Moving on to... 1 Prediction 2 Violation
More informationThis module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics
This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.
More informationContextual Effects in Modeling for Small Domains
University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Engineering and Information Sciences 2011 Contextual Effects in
More informationDevelopment of methodology for the estimate of variance of annual net changes for LFS-based indicators
Development of methodology for the estimate of variance of annual net changes for LFS-based indicators Deliverable 1 - Short document with derivation of the methodology (FINAL) Contract number: Subject:
More informationRegression Models - Introduction
Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,
More informationMultivariate Regression: Part I
Topic 1 Multivariate Regression: Part I ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà Outline of this topic Statement of the objective: we want to explain the behavior of one variable as a
More informationA comparison of pivotal sampling and unequal. probability sampling with replacement
arxiv:1609.02688v2 [math.st] 13 Sep 2016 A comparison of pivotal sampling and unequal probability sampling with replacement Guillaume Chauvet 1 and Anne Ruiz-Gazen 2 1 ENSAI/IRMAR, Campus de Ker Lann,
More informationMultiple Regression Analysis: Heteroskedasticity
Multiple Regression Analysis: Heteroskedasticity y = β 0 + β 1 x 1 + β x +... β k x k + u Read chapter 8. EE45 -Chaiyuth Punyasavatsut 1 topics 8.1 Heteroskedasticity and OLS 8. Robust estimation 8.3 Testing
More informationNonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling
Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Ji-Yeon Kim Iowa State University F. Jay Breidt Colorado State University Jean D. Opsomer Colorado State University
More informationIntroduction to Econometrics. Heteroskedasticity
Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory
More informationMultiple Linear Regression
Multiple Linear Regression Asymptotics Asymptotics Multiple Linear Regression: Assumptions Assumption MLR. (Linearity in parameters) Assumption MLR. (Random Sampling from the population) We have a random
More informationEstimating Bounded Population Total Using Linear Regression in the Presence of Supporting Information
International Journal of Mathematics and Computational Science Vol. 4, No. 3, 2018, pp. 112-117 http://www.aiscience.org/journal/ijmcs ISSN: 2381-7011 (Print); ISSN: 2381-702X (Online) Estimating Bounded
More informationLECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity
LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists
More informationA Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,
A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type
More informationEfficient Robbins-Monro Procedure for Binary Data
Efficient Robbins-Monro Procedure for Binary Data V. Roshan Joseph School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205, USA roshan@isye.gatech.edu SUMMARY
More informationAPPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA
Journal of Research (Science), Bahauddin Zakariya University, Multan, Pakistan. Vol.15, No.1, June 2004, pp. 97-106 ISSN 1021-1012 APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA G. R. Pasha 1 and
More informationTaking into account sampling design in DAD. Population SAMPLING DESIGN AND DAD
Taking into account sampling design in DAD SAMPLING DESIGN AND DAD With version 4.2 and higher of DAD, the Sampling Design (SD) of the database can be specified in order to calculate the correct asymptotic
More informationVariance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.
10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for
More informationSampling Theory. Improvement in Variance Estimation in Simple Random Sampling
Communications in Statistics Theory and Methods, 36: 075 081, 007 Copyright Taylor & Francis Group, LLC ISS: 0361-096 print/153-415x online DOI: 10.1080/0361090601144046 Sampling Theory Improvement in
More informationDiscussion of Sensitivity and Informativeness under Local Misspecification
Discussion of Sensitivity and Informativeness under Local Misspecification Jinyong Hahn April 4, 2019 Jinyong Hahn () Discussion of Sensitivity and Informativeness under Local Misspecification April 4,
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-13-244R2 Title Examining some aspects of balanced sampling in surveys Manuscript ID SS-13-244R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.2013.244 Complete
More informationDesign of Screening Experiments with Partial Replication
Design of Screening Experiments with Partial Replication David J. Edwards Department of Statistical Sciences & Operations Research Virginia Commonwealth University Robert D. Leonard Department of Information
More informationA characterization of consistency of model weights given partial information in normal linear models
Statistics & Probability Letters ( ) A characterization of consistency of model weights given partial information in normal linear models Hubert Wong a;, Bertrand Clare b;1 a Department of Health Care
More informationBAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS
BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS Andrew A. Neath 1 and Joseph E. Cavanaugh 1 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, Illinois 606, USA
More information1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i
1/34 Outline Basic Econometrics in Transportation Model Specification How does one go about finding the correct model? What are the consequences of specification errors? How does one detect specification
More informationThe New sampling Procedure for Unequal Probability Sampling of Sample Size 2.
. The New sampling Procedure for Unequal Probability Sampling of Sample Size. Introduction :- It is a well known fact that in simple random sampling, the probability selecting the unit at any given draw
More informationCOMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract
Far East J. Theo. Stat. 0() (006), 179-196 COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS Department of Statistics University of Manitoba Winnipeg, Manitoba, Canada R3T
More informationTesting Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA
Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize
More informationSanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore
AN EMPIRICAL LIKELIHOOD BASED ESTIMATOR FOR RESPONDENT DRIVEN SAMPLED DATA Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore Mark Handcock, Department
More informationG. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication
G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?
More informationAdaptive two-stage sequential double sampling
Adaptive two-stage sequential double sampling Bardia Panahbehagh Afshin Parvardeh Babak Mohammadi March 4, 208 arxiv:803.04484v [math.st] 2 Mar 208 Abstract In many surveys inexpensive auxiliary variables
More informationTopic 4: Model Specifications
Topic 4: Model Specifications Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Functional Forms 1.1 Redefining Variables Change the unit of measurement of the variables will
More informationWhy Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory
Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,
More informationSampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.
Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Keywords: Survey sampling, finite populations, simple random sampling, systematic
More informationContributions to the Theory of Unequal Probability Sampling. Anders Lundquist
Contributions to the Theory of Unequal Probability Sampling Anders Lundquist Doctoral Dissertation Department of Mathematics and Mathematical Statistics Umeå University SE-90187 Umeå Sweden Copyright Anders
More informationHISTORICAL PERSPECTIVE OF SURVEY SAMPLING
HISTORICAL PERSPECTIVE OF SURVEY SAMPLING A.K. Srivastava Former Joint Director, I.A.S.R.I., New Delhi -110012 1. Introduction The purpose of this article is to provide an overview of developments in sampling
More informationLecture 9 SLR in Matrix Form
Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.
More informationMinimax design criterion for fractional factorial designs
Ann Inst Stat Math 205 67:673 685 DOI 0.007/s0463-04-0470-0 Minimax design criterion for fractional factorial designs Yue Yin Julie Zhou Received: 2 November 203 / Revised: 5 March 204 / Published online:
More informationEFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS
Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh September 13 & 15, 2005 1. Complete-case analysis (I) Complete-case analysis refers to analysis based on
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More information460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses
Statistica Sinica 5(1995), 459-473 OPTIMAL DESIGNS FOR POLYNOMIAL REGRESSION WHEN THE DEGREE IS NOT KNOWN Holger Dette and William J Studden Technische Universitat Dresden and Purdue University Abstract:
More informationTHE THEORY AND PRACTICE OF MAXIMAL BREWER SELECTION WITH POISSON PRN SAMPLING
THE THEORY AND PRACTICE OF MAXIMAL BREWER SELECTION WITH POISSON PRN SAMPLING Phillip S. Kott And Jeffrey T. Bailey, National Agricultural Statistics Service Phillip S. Kott, NASS, Room 305, 351 Old Lee
More informationO.O. DAWODU & A.A. Adewara Department of Statistics, University of Ilorin, Ilorin, Nigeria
Efficiency of Alodat Sample Selection Procedure over Sen - Midzuno and Yates - Grundy Draw by Draw under Unequal Probability Sampling without Replacement Sample Size 2 O.O. DAWODU & A.A. Adewara Department
More informationEmpirical Likelihood Methods for Sample Survey Data: An Overview
AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use
More informationC. J. Skinner Cross-classified sampling: some estimation theory
C. J. Skinner Cross-classified sampling: some estimation theory Article (Accepted version) (Refereed) Original citation: Skinner, C. J. (205) Cross-classified sampling: some estimation theory. Statistics
More information