Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Size: px
Start display at page:

Download "Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities"

Transcription

1 Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance estimation for the Horvitz-Thompson estimator of a population total in sampling designs with zero pairwise inclusion probabilities, known as non-measurable designs. We decompose the standard Horvitz-Thompson variance estimator under such designs and characterize the bias precisely. We develop a bias correction that is guaranteed to be weakly conservative (non-negatively biased) regardless of the nature of the non-measurability. The analysis sheds light on conditions under which the standard Horvitz-Thompson variance estimator performs well despite non-measurability and the conservative bias correction may outperform commonly-used approximations. Keywords: Horvitz-Thompson estimation, non-measurable designs, variance estimation 1 Introduction Sampling designs sometimes result in pairs of units having zero probability of being jointly included in the sample. Horvitz and Thompson (1952) s statement of the properties of the finite population total makes clear that general, unbiased variance estimation for estimators of population totals is impossible for such non-measurable designs (Särndal et al., 1992, 33). Optimal methods for variance estimation in these cases remains an open problem to this day. This paper analyzes the nature of the biases that non-measurability introduces for the standard Horvitz-Thompson estimator and studies an approach to correct for this bias in a manner that is guaranteed to be conservative. While our results cannot offer a solution to the non-measurability problem for all practical applications, we Peter M. Aronow, Department of Political Science, Yale University, 77 Prospect St., New Haven, CT 652, peter.aronow@yale.edu; Cyrus Samii, Department of Politics, New York University, 19 West 4th St., New York, NY 112, cds283@nyu.edu 1

2 do clarify conditions under which the standard estimator performs well and where the conservative bias correction outperforms commonly-used approximations. Despite their theoretical drawbacks, sampling designs with zero pairwise inclusion probabilities are quite common. A common example of a non-measurable design is one that draws only single units or clusters from a set of strata. This may occur if the population or a subpopulation of interest is incidentally sparse over stratification cells. Another instance of a non-measurable design is a systematic sample in which unit indices are sampled from a list in multiples from a random starting value. In these designs, units whose indices are multiples from different starting values have zero joint probability of inclusion. Approximate methods have been proposed for special cases, as discussed in Hansen et al. (1953, Section 9.15), Särndal et al. (1992, Chapter 3), and Wolter (27, Chapters 2 and 8). In the single-unit per stratum case, a common approach is to collapse strata and assume units were drawn via a simple random sample from the larger, collapsed stratum. For systematic samples, the standard approach is to use an approximation based on an assumption of simple random sampling with replacement. These approximate methods are generally biased to a degree that cannot be determined from the data. In some cases, it can be shown that the bias will tend to be positive, but such is not the case generally, and especially so when the zero pairwise inclusion probabilities occur in a haphazard manner. This paper begins by decomposing the bias of the Horvitz-Thompson variance estimator under non-measurability. This exposes precisely how conditions on the underlying data result in more or less bias. We also show how a simple application of Young s inequality yields a bias correction and a class of estimators guaranteed to have weakly positive bias as well as no bias under special conditions. We discuss implications for applied work. 2 Variance estimation for the Horvitz-Thompson estimator Consider a population U indexed by 1,..., k,..., N and a sampling design such that the probability of inclusion in the sample for unit k is given by π k, and the joint inclusion probability for units k and l is given by π kl. Under a measurable design, two conditions obtain: (1) π k > and π k is known for all k U and (2) π kl > and π kl is known for all k, l U. Non-measurable designs include those for which either of the two conditions for a measurable design do not hold. Failure to meet the former condition precludes unbiased estimation of totals. The Horvitz-Thompson estimator of a population total is given by ˆt = k s y k = y k I k, π k π k 2

3 where I k {, 1} is unit k s inclusion indicator, the only stochastic component of the expression, with E (I k ) = π k, the inclusion probability, and s and U refer to the sample and the population, respectively. Define E (I k I l ) = π kl, which is the probability that both units k and l from U are included in the sample. Since I k I k = I k, E (I k I k ) = π kk = π k by construction. When condition (1) holds, as we assume throughout, the Horvitz-Thompson estimator is unbiased. 2.1 Properties of the Horvitz-Thompson variance estimator under measurability By Horvitz and Thompson (1952), the variance of the Horvitz-Thompson estimator for the total, Var (ˆt) = Cov (I k, I l ) y k y l π k π l l U = ( ) 2 yk Var (I k ) + Cov (I k, I l ) y k y l. π k π l π k l U\k We label a sample from a measurable design, s M, and an unbiased estimator for Var (ˆt) on s M is given by, Var (ˆt) = Cov (I k, I l ) y k y l = Cov (I k, I l ) I k I l π kl π k π l π kl k s M l s M l U y k y l, π k π l where the only stochastic part of the latter expression is I k I l, and unbiasedness is due to E (I k I l ) = π kl. 2.2 Properties of the Horvitz-Thompson variance estimator under non-measurability We now examine the case where condition 2 does not hold: π kl = for some units k, l U. Because I k is a Bernoulli random variable with probability π k, Cov (I k, I l ) = π kl π k π l for k l, and Cov (I k, I k ) = Var (I k ) = π k (1 π k ). Then, we can re-express the variance above as, Var (ˆt) = ( ) 2 yk π k (1 π k ) + (π kl π k π l ) y k y l π k = π k (1 π k ) π k ( yk π k ) 2 + l U\k l {U\k:π kl >} 3 π l (π kl π k π l ) y k y l π k π l y k y l. } {{ } A

4 For k and l such that π kl =, the sampling design will never provide information on the component of the variance labeled as A above, since we will never observe y k and y l together. We label a sample from a design where condition 2 fails as s. When Var (ˆt) is applied to s, the result is unbiased for Var (ˆt) + A. We state this formally as follows: Proposition 1. When s refers to a sample from a design with some π kl =, we have, E [ Var (ˆt)] = Var (ˆt) + y k y l = Var (ˆt) + A. Proof. The result follows from, [ ] Cov (I k, I l ) y k y l E = E π kl π k π l k s l s l {U:π kl >} I k I l Cov (I k, I l ) π kl y k y l π k π l = Var (I k ) = Var (ˆt) + = Var (ˆt) + A. ( yk π k ) 2 + l {U\k:π kl >} y k y l Cov (I k, I l ) y k π k y l π l The standard Horvitz-Thompson variance estimator, if applied to designs with zero pairwise inclusion probabilities, can therefore have a positive or a negative bias. If the y k, y l values are always nonnegative (or always nonpositive), then the bias is always nonnegative. When values may be positive or negative, A is the sum of cross-products of outcomes that never appear together under the design sample. If the jointly exclusive outcomes are centered over zero, then no correlation in these outcomes would tend to result in small bias, positive correlation in positive bias, and negative correlation in negative bias. 3 Conservative bias correction for the Horvitz- Thompson estimator under non-measurability The case where A may be less than zero for a non-measurable design suggests the need for some adjustment that will guarantee a bias that is weakly bounded below by zero. We first develop a general bias correction that is guaranteed to be conservative, later providing a special simplified case for practical usage. 4

5 3.1 General formulation Consider the following variance estimator: Var C (ˆt) = l {U:π kl >} I k I l Cov (I k, I l ) π kl y k y l + π k π l ( I k y k a kl a kl π k ) y l bkl + I l, b kl π l 1 where a kl, b kl are positive real numbers such that a kl + 1 b kl = 1 for all pairs k, l with π kl =. The estimator is guaranteed to produce an expected value greater than or equal to the true variance for all designs, and is thus conservative. We state this condition of the estimator formally: Proposition 2. E [ Var C (ˆt)] Var (ˆt). Proof. By Young s inequality, y k a kl a kl + y l bkl b kl y k y l, if 1 a kl + 1 b kl A = and Therefore = 1. Define A such that, A y k a kl + y l bkl a kl b kl y k y l Var (ˆt) + A + A Var (ˆt). y k y l The associated Horvitz-Thompson estimator of A would be  = ( ) y k a kl y l bkl I k + I l, a kl π k b kl π l which is unbiased [ ] by E (I k ) = π k and E (I l ) = π l. Since E = A, by Proposition 1, y k y l = A. y k y l = A [ ] Cov (I k, I l ) y k y l E + π kl π k π  = Var (ˆt) + A + A l k s l s E [ Var C (ˆt)] Var (ˆt). (1) 5

6 Substituting terms, E l {U:π kl >} I k I l Cov (I k, I l ) π kl y k y l + π k π l ( I k y k a kl a kl π k ) y l bkl + I l Var (ˆt). b kl π l Var C (ˆt) is justified as a conservative estimator for the case when A is not known to be positive. This estimator is unbiased under a special condition: Corollary 1. If, for all pairs k, l such that π kl =, (i) y k a kl = y l b kl and (ii) yk y l = y k y l, E [ Var C (ˆt)] = Var (ˆt). Proof. By (i), (ii) and Young s inequality, Therefore, A = and It follows that y k a kl a kl y k a kl + y l bkl a kl b kl + y l bkl b kl = y k y l = y k y l. = Var (ˆt) + A + A = Var (ˆt) E [ Var C (ˆt)] = Var (ˆt). y k y l = y k y l = A If any units k, l are in clusters (i.e., Pr (I k I l ) = ), these units should be totaled into one larger unit before estimation. Combining units will, in general, reduce the bias of the variance estimator because only pairs of cluster-level totals will be included in A, as opposed to all constituent pairs. 3.2 Simplified special case In general, it would be difficult to assign optimal values of a kl and b kl for all pairs k, l such that π kl =. Instead, we examine one intuitive case, assigning all a kl = b kl = 2: Var C2 (ˆt) = Cov (I k, I l ) y k y l I k I l + ( ) yk 2 yl 2 I k + I l. π kl π k π l 2π k 2π l l {U:π kl >} As a special case of Var C (ˆt), Var C2 (ˆt) is also conservative: 6

7 Corollary 2. E [ Var C2 (ˆt)] Var (ˆt). 1 Proof. For all pairs k, l such that π kl =, holds. a kl + 1 b kl = = 1. Proposition 1 therefore The choice to set all a kl = b kl = 2 is justified by the fact that it will yield the lowest value of the estimator Var C (ˆt) subject to the constraint that a kl and b kl are fixed as constants a and b over all k, l. Corollary 3. Among the class of estimators Var Cab (ˆt), defined as the set of estimators Var C (ˆt) such that all a kl = a and all b kl = b, Var ] C2 (ˆt) = min a,b [ Var Cab (ˆt). Proof. By simple algebra, Var Cab (ˆt) = Cov (I k, I l ) y k y l I k I l + y k (I a k π kl π k π l aπ k l {U:π kl >} = Var (ˆt) + ) y k (I a y l b k + I l aπ k bπ l = Var (ˆt) + 1 ) y k (I a y l b y k b y l a k + I l + I k + I l 2 aπ k bπ l bπ k aπ l = Var (ˆt) + 1 ( ( Ik yk a 2 π k a + y ) k b + I ( l yl a b π l a + y l b b ) y l b + I l bπ l )). By Young s inequality, given = 1, y k a + y k b y 2 a b a b k, but equality must hold if a = b = 2. Similarly, y l a + y l b y 2 a b l, but equality must hold if a = b = 2. Since I k π k and I k π k, any choice a b can only yield Var Cab (ˆt) Var C2 (ˆt). Given all values of y k and y l, it is possible to derive an optimal vector of a kl and b kl values that varies over k, l, but such a derivation may not be of practical value. 4 Applications Proposition 1 indicates that the bias of the Horvitz-Thompson estimator under nonmeasurability is given by, A = y k y l. This expression, along with the fact that A A, makes it evident that the degree of bias in Var (ˆt) and Var C (ˆt) depends a great deal on the number of pairs with zero pairwise 7

8 inclusion probabilities. For designs where this number is small, Var (ˆt) may provide a reasonable and conservative estimator for cases where y k takes the same sign for all k, and Var C (ˆt) may provide a reasonable and conservative estimator for cases where y k may take different signs for some k. An example that arises frequently is stratified sampling where for a relatively small proportion of cases, we have small strata from which we draw only one unit. For designs that result in many pairs having zero inclusion probabilities, Var (ˆt) and Var C (ˆt) could be wildly over-conservative and other estimators may be preferred in terms of criteria such as mean square error. A prominent example is systematic sampling. Indeed, Särndal et al. (1992, 76) propose that under systematic sampling, the Horvitz- Thompson variance estimator, Var (ˆt), can give a non-sensical result. The expression for A makes it clear why this would be the case. Wolter (27, Ch. 8) shows that simpler biased estimators, such as the with-replacement (Hansen-Hurwitz) variance estimator, can be reliable, if slightly conservative, in a broad range of data scenarios under equal probability and probability proportional to size (PPS) systematic sampling. Nonetheless, it is known that the with-replacement estimator fails to account adequately for sampling variance when outcome variance within systematic sample clusters is smaller than the between cluster variance. In such cases, Var (ˆt) would bound this variance in expectation when outcomes are all of the same sign, and Var C (ˆt) would always bound this variance in expectation. Of course, it may still be the case that the bias is too large to be of much use, and so we would not suggest that Var (ˆt) and Var C (ˆt) provides a full solution to the variance estimation problem for systematic sampling under high intra-cluster correlation. Results from simulation studies are available in a supplement and Var C2 (ˆt) perform relative to other, commonly-used approximations in these applied scenarios. The simulations demonstrate situations when these estimators are preferable to commonly used alternatives. In the case of one-unit-per-stratum sampling, we show that these estimators are less biased than the commonly used collapsed stratum estimator in a range of scenarios. In the case of PPS systematic sampling, these estimators perform favorably when the population exhibits substantial periodicity, a case when the commonly-used with-replacement estimator may be grossly negatively biased. [These simulation results are appended to the manuscript.] 5 Conclusion We have characterized precisely the bias of the Horvitz-Thompson estimator under nonmeasurability and used this characterization to develop a conservative bias correction. These estimators reflect the fundamental uncertainty inherent to non-measurable designs. Compared to available approximate methods, these estimators may sometimes perform better and sometimes worse from a practical perspective. But available approximate methods may be biased in ways that cannot always be evaluated in terms of either magnitude or 8

9 sign. The estimators developed in this paper may therefore provide an informative measure of sampling variability with which analysts can agree without invoking additional assumptions or resorting to methods that carry the potential for negative bias. The bias term, A, has a simple form that suggests the possibility of refinements to the estimators developed here, something that we leave open for future research. References Hansen, M. H., W. N. Hurwitz, and W. G. Madow (1953). Sample Survey Methods and Theory: Volume 1 Methods and Theory. New York: Wiley. Horvitz, D. and D. Thompson (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association 47(26), Särndal, C.-E., B. Swensson, and J. Wretman (1992). Model Assisted Survey Sampling. New York: Springer. Wolter, K. (27). Introduction to Variance Estimation, Second Edition. New York: Springer. 9

10 Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Supplement: Simulation Studies A Introduction This supplement contains a simulation study of applications of the estimators discussed in the paper, Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities. In the simulations, expected values of Var (ˆt) and Var C (ˆt) were computed exactly using expressions from Propositions 1 and 2 in the main text. Expected values for alternative estimators were computed using exact expressions for bias when available and via simulated sample draws otherwise. All simulations study expected values of estimators for the sampling variance of the Horvitz-Thompson estimator for the population mean, ˆµ HT 1 N ˆt. All simulations were carried out in the R statistical computing environment (the code is available from the authors). B Lonely primary sampling units in stratified sampling The first set of simulation studies examines variance estimation when single units are drawn from strata under stratified sampling. This situation arises in a number of applied contexts, including stratified sampling under proportional allocation (Lohr, 1999, 14-6) as well as subgroup analyses using subgroups that are sparsely distributed over strata. We build our simulations to match the context considered by Wolter (27, 5-57) in his study of the collapsed stratum estimator. Simplifying that context, we consider two strata each of size M, in which case N = 2M. From each stratum a single unit is randomly sampled. Thus, for unit k in stratum s, π ks = π = 1 M, and for units k and l in strata 1 or 2, we have { if s = t π ks,lt = 1 if s t. M 2 A1

11 In stratum s, outcomes y s = (y 1s,..., y Ms ) are drawn as, y ks = α s + ɛ is, where α s is a stratum-specific constant and ɛ ks N (, 1). For the two strata, 1 and 2, we keep things simple by setting α 1 = α = α 2, in which case the intra-class correlation over strata is given by, α 2 ρ ICC = α 2 + 1/2. We simulated 1, sets of outcomes, (y 1, y 2 ) over varying values of M and ρ ICC. To each set, we applied the single-unit-per-stratum design. Then, we computed three approximations of the sampling variance of the Horvitz-Thompson estimator of the population mean: 1. The Horvitz-Thompson estimator for the sampling variance of the estimator for the population mean, (1/N 2 ) Var (ˆt). 2. The bias-adjusted estimator, (1/N 2 ) Var C2 (ˆt). 3. The collapsed stratum estimator (Wolter, 27, 51-2), which for this case, given unit k from stratum 1 and unit l from stratum 2 selected into s, equals, V CS (ˆµ HT ) = 1 ( yk1 y ) 2 l2 = 1 N 2 π 1 π 2 4 (y k1 y l2 ) 2. Because of the zero inclusion probabilities within strata, each of these estimators is biased. No generally unbiased estimator exists for the sampling variance of the population mean estimator in this scenario. The collapsed stratum estimator is a commonly used approximation in the applied literature. Table A1 and Figure A1 display results from these simulations. For each ρ ICC and M value, we used the empirical variance of ˆµ HT as an estimate of the true variance (Var or True ), and means of the computed (1/N 2 ) Var (ˆt) ( Var HT or Horvitz-Thompson ), (1/N 2 ) Var C2 (ˆt) ( Var C2 or Conservative (A ) ), and V CS (ˆµ HT ) values as estimates of the expected values of these estimators. In these simulations, (1/N 2 ) Var (ˆt) strictly dominates (1/N 2 ) Var C2 (ˆt) and V CS (ˆµ HT ) as an approximation to the true sampling variance. The (1/N 2 ) Var C2 (ˆt) estimator dominates V CS (ˆµ HT ) for smaller stratum sizes and the two perform similarly for larger stratum sizes. When ρ ICC is close to 1, all approximations perform very badly. The reasons for this differ for the estimators, however. The bias of V CS (ˆµ HT ) is driven up directly by between stratum variance. The bias of (1/N 2 ) Var (ˆt) and (1/N 2 ) Var C2 (ˆt) is driven by the fact that within-stratum values become less symmetric over zero as α increases. A2

12 C Systematic probability-proportional-to-size sampling A second set of simulations is based on systematic probability-proportional-to-size (PPS) sampling, another commonly used design (Lohr, 1999, 185-6; Wolter, 27, 332-5). The problem that arises for variance estimation under systematic sampling, whether PPS or not, is that it produces zero pairwise inclusion probabilities among population units that are proximate on the list used to take the systematic sample. For the simulations, a size variable, x = (x 1,..., x N ), is produced as a vector of N random draws from a negative binomial distribution with mean 6, variance 16, and offset of 1 to avoid zero values. These settings were meant to recreate a data scenario roughly comparable to Wolter (27, Ch. 8). An outcome variable, y, is created as, y k = βx k + ɛ k, for k = 1,..., N, where β = ρ(σ y /σ x ), ɛ k is Gaussian noise with mean zero and standard deviation σ y 1 ρ2, and σ x and σ y are the standard deviation of the N values in x and y respectively, where E x (σ x ) = 4 and we fix σ y = 3 (again, to mimic data distributed as in Wolter (27, Ch. 8)). Thus, x and y have a linear relationship through the origin and their correlation is given by ρ. Systematic PPS sampling was implemented such that each unit s first order inclusion probability is given by, π k = x k N l=1 x m, l where m is the size of the systematic sample, and joint inclusion probabilities are computed as per Madow (1949). For the simulations, sampling and computation of inclusion probabilities were carried out using functions provided by Tillé and Matei (211). Because y is proportional to x with x 1, Proposition 1 in the main text indicates that the Horvitz-Thompson variance estimator for ˆµ HT, (1/N 2 ) Var (ˆt), provides a conservative approximation of the sampling variance of ˆµ HT. For comparison, we also computed (via 1 simulated sample draws) the expected value of the with-replacement variance estimator for ˆµ HT (Wolter, 27, expr ), Var W R (ˆµ HT ) = 1 ( ) 2 yk ˆt, N 2 m(m 1) p k k s where p k = π k /m is the notional per-draw selection probability. Var W R (ˆµ HT ) is regularly used for systematic samples in applied settings. Table A2 and Figure A2 display results from simulations for this scenario. Each graph plots the true variance (Var or True ), expected value of the (1/N 2 ) Var (ˆt) ( Var HT or Conservative (HT) ), and expected value of Var W R (ˆµ HT ) ( With-repl. ) over values of ρ and for varying sample sizes (m) and sampling proportions (m/n). As expected, A3

13 Var (ˆt) tends to produce very over-conservative estimates; this is due to the large number of pairs with zero-inclusion probabilities, each of which contributes to the bias. Given that the size variable is randomly sorted, this data scenario is very favorable to Var W R (ˆµ HT ), as is also evident from the results. For systematic sampling, a more challenging inference scenario arises when the data exhibit periodicity. In such a case, the within-sample variance fails to reflect the total of within and between-sample variance, potentially rendering approximations such as the with-replacement estimator anti-conservative (Särndal et al., 1992, 78-83). To illustrate this point, we performed a second set of simulations where the size variable, x, was drawn as an N-length vector of repeating 2 s and 1 s, thus maintaining variance values that are comparable to the previous simulation. Also, so that we could provide an illustration of the performance of Var C2 (ˆt), we produced y k values in the same manner as above, but then worked with the demeaned value, ỹ k y k ȳ, as the variable with which ˆµ HT was computed. By Propositions 1 and 2, Var (ˆt) may have either positive or negative bias, while Var C2 (ˆt) yields a conservative approximation of the sampling variance of ˆµ HT. Table A3 and Figure A3 display the results of these simulations. Each graph plots the true variance (Var or True ), expected value of (1/N 2 ) Var (ˆt) ( Var HT or Conservative (HT) ), expected value of (1/N 2 ) Var C2 (ˆt) ( Var C2 or Conservative (A ) ), and expected value of Var W R (ˆµ HT ) ( With-repl. ) over values of ρ and for varying sample sizes (m) and sampling proportions (m/n). As expected, Var W R (ˆµ HT ) tends to greatly understate the sampling variance of ˆµ HT, while Var C2 (ˆt) is generally quite conservative. Interestingly, Var (ˆt) tends to match the true value of the variance almost perfectly. This occurs because of the special nature of the data generating process, which results in the elements in the bias term, A, canceling each other out, in which case A tends to. References Lohr, S. L. (1999). Sampling: Design and Analysis. Pacific Grove: Brooks/Cole. Madow, W. G. (1949). On the theory of systematic sampling, II. Annals of Mathematical Statistics 2(3), Särndal, C.-E., B. Swensson, and J. Wretman (1992). Model Assisted Survey Sampling. New York: Springer. Tillé, Y, and A. Matei. (211). sampling: Survey Sampling. R package, version A4

14 Wolter, K. M. (27) Introduction to Variance Estimation, Second Edition. New York: Springer. A5

15 Table A1: Simulation results for lonely primary sampling units in stratified sampling for different values of ρ ICC and strata sizes (M). Sim. Params. True Mean from sims. Relative Bias Nominal Bias ρ ICC M Var Var HT Var C2 Var CS Var HT Var C2 Var CS Var HT Var C2 Var CS A6

16 Figure A1: Expected value of estimators of sampling variance of ˆµ HT under lonely primary sampling units in stratified sampling for different values of ρ ICC and strata sizes (M). M = 2 M = 5 M = True Horvitz Thompson Conservative (A*) Collapsed stratum ρ ICC ρ ICC ρ ICC M = 1 M = ρ ICC ρ ICC A7

17 Table A2: Simulation results for PPS systematic sampling: random size case. Simulation Parameters True Mean from sims. Relative Bias Nominal Bias N m ρ y Var Var HT Var W R Var HT Var W R Var HT Var W R A8

18 Table A3: Simulation results for PPS systematic sampling: periodic size case. Simulation Parameters True Mean from sims. Relative Bias Nominal Bias N m ρ y Var Var HT Var C2 Var W R Var HT Var C2 Var W R Var HT Var C2 Var W R A9

19 Figure A2: Expected value of estimators of sampling variance of ˆµ HT under PPS systematic sampling for different values of ρ, sample sizes (m), and sampling proportions (m/n): random size case. N=8 m=2 N=4 m=2 2 True Conservative (HT) With repl N=12 m=3 N=6 m= N=24 m=6 N=12 m= A1

20 Figure A3: Expected value of estimators of sampling variance of ˆµ HT under PPS systematic sampling, for different values of ρ, sample sizes (m), and sampling proportions (m/n): periodic size case. N=8 m=2 N=4 m= True Conservative (HT) Conservative (A*) With repl N=12 m=3 N=6 m= N=24 m=6 N=12 m= A11

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson 1 Introduction When planning the sampling strategy (i.e.

More information

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Statistica Sinica 8(1998), 1165-1173 A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Phillip S. Kott National Agricultural Statistics Service Abstract:

More information

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Keywords: Survey sampling, finite populations, simple random sampling, systematic

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson Department of Statistics Stockholm University Introduction

More information

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy. 5 Sampling with Unequal Probabilities Simple random sampling and systematic sampling are schemes where every unit in the population has the same chance of being selected We will now consider unequal probability

More information

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/2/03) Ed Stanek Here are comments on the Draft Manuscript. They are all suggestions that

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

RESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy.

RESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy. CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2014 www.csgb.dk RESEARCH REPORT Ina Trolle Andersen, Ute Hahn and Eva B. Vedel Jensen Vanishing auxiliary variables in PPS sampling with applications

More information

Research Note: A more powerful test statistic for reasoning about interference between units

Research Note: A more powerful test statistic for reasoning about interference between units Research Note: A more powerful test statistic for reasoning about interference between units Jake Bowers Mark Fredrickson Peter M. Aronow August 26, 2015 Abstract Bowers, Fredrickson and Panagopoulos (2012)

More information

ICES training Course on Design and Analysis of Statistically Sound Catch Sampling Programmes

ICES training Course on Design and Analysis of Statistically Sound Catch Sampling Programmes ICES training Course on Design and Analysis of Statistically Sound Catch Sampling Programmes Sara-Jane Moore www.marine.ie General Statistics - backed up by case studies General Introduction to sampling

More information

Non-uniform coverage estimators for distance sampling

Non-uniform coverage estimators for distance sampling Abstract Non-uniform coverage estimators for distance sampling CREEM Technical report 2007-01 Eric Rexstad Centre for Research into Ecological and Environmental Modelling Research Unit for Wildlife Population

More information

Chapter 3: Element sampling design: Part 1

Chapter 3: Element sampling design: Part 1 Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014 Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part

More information

The R package sampling, a software tool for training in official statistics and survey sampling

The R package sampling, a software tool for training in official statistics and survey sampling The R package sampling, a software tool for training in official statistics and survey sampling Yves Tillé 1 and Alina Matei 2 1 Institute of Statistics, University of Neuchâtel, Switzerland yves.tille@unine.ch

More information

arxiv: v4 [math.st] 20 Jun 2018

arxiv: v4 [math.st] 20 Jun 2018 Submitted to the Annals of Applied Statistics ESTIMATING AVERAGE CAUSAL EFFECTS UNDER GENERAL INTERFERENCE, WITH APPLICATION TO A SOCIAL NETWORK EXPERIMENT arxiv:1305.6156v4 [math.st] 20 Jun 2018 By Peter

More information

Unequal Probability Designs

Unequal Probability Designs Unequal Probability Designs Department of Statistics University of British Columbia This is prepares for Stat 344, 2014 Section 7.11 and 7.12 Probability Sampling Designs: A quick review A probability

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Statistica Sinica 22 (2012), 777-794 doi:http://dx.doi.org/10.5705/ss.2010.238 BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Desislava Nedyalova and Yves Tillé University of

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

Variants of the splitting method for unequal probability sampling

Variants of the splitting method for unequal probability sampling Variants of the splitting method for unequal probability sampling Lennart Bondesson 1 1 Umeå University, Sweden e-mail: Lennart.Bondesson@math.umu.se Abstract This paper is mainly a review of the splitting

More information

Sampling. Jian Pei School of Computing Science Simon Fraser University

Sampling. Jian Pei School of Computing Science Simon Fraser University Sampling Jian Pei School of Computing Science Simon Fraser University jpei@cs.sfu.ca INTRODUCTION J. Pei: Sampling 2 What Is Sampling? Select some part of a population to observe estimate something about

More information

arxiv: v1 [math.st] 28 Feb 2017

arxiv: v1 [math.st] 28 Feb 2017 Bridging Finite and Super Population Causal Inference arxiv:1702.08615v1 [math.st] 28 Feb 2017 Peng Ding, Xinran Li, and Luke W. Miratrix Abstract There are two general views in causal analysis of experimental

More information

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing 1. Purpose of statistical inference Statistical inference provides a means of generalizing

More information

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science EXAMINATION: QUANTITATIVE EMPIRICAL METHODS Yale University Department of Political Science January 2014 You have seven hours (and fifteen minutes) to complete the exam. You can use the points assigned

More information

The ESS Sample Design Data File (SDDF)

The ESS Sample Design Data File (SDDF) The ESS Sample Design Data File (SDDF) Documentation Version 1.0 Matthias Ganninger Tel: +49 (0)621 1246 282 E-Mail: matthias.ganninger@gesis.org April 8, 2008 Summary: This document reports on the creation

More information

Uncertainty due to Finite Resolution Measurements

Uncertainty due to Finite Resolution Measurements Uncertainty due to Finite Resolution Measurements S.D. Phillips, B. Tolman, T.W. Estler National Institute of Standards and Technology Gaithersburg, MD 899 Steven.Phillips@NIST.gov Abstract We investigate

More information

Sampling: A Brief Review. Workshop on Respondent-driven Sampling Analyst Software

Sampling: A Brief Review. Workshop on Respondent-driven Sampling Analyst Software Sampling: A Brief Review Workshop on Respondent-driven Sampling Analyst Software 201 1 Purpose To review some of the influences on estimates in design-based inference in classic survey sampling methods

More information

Stochastic Histories. Chapter Introduction

Stochastic Histories. Chapter Introduction Chapter 8 Stochastic Histories 8.1 Introduction Despite the fact that classical mechanics employs deterministic dynamical laws, random dynamical processes often arise in classical physics, as well as in

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

Main sampling techniques

Main sampling techniques Main sampling techniques ELSTAT Training Course January 23-24 2017 Martin Chevalier Department of Statistical Methods Insee 1 / 187 Main sampling techniques Outline Sampling theory Simple random sampling

More information

POPULATION AND SAMPLE

POPULATION AND SAMPLE 1 POPULATION AND SAMPLE Population. A population refers to any collection of specified group of human beings or of non-human entities such as objects, educational institutions, time units, geographical

More information

Estimation of change in a rotation panel design

Estimation of change in a rotation panel design Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS028) p.4520 Estimation of change in a rotation panel design Andersson, Claes Statistics Sweden S-701 89 Örebro, Sweden

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Cluster Sampling 2. Chapter Introduction

Cluster Sampling 2. Chapter Introduction Chapter 7 Cluster Sampling 7.1 Introduction In this chapter, we consider two-stage cluster sampling where the sample clusters are selected in the first stage and the sample elements are selected in the

More information

A comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group

A comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group A comparison of weighted estimators for the population mean Ye Yang Weighting in surveys group Motivation Survey sample in which auxiliary variables are known for the population and an outcome variable

More information

Introduction to Survey Data Analysis

Introduction to Survey Data Analysis Introduction to Survey Data Analysis JULY 2011 Afsaneh Yazdani Preface Learning from Data Four-step process by which we can learn from data: 1. Defining the Problem 2. Collecting the Data 3. Summarizing

More information

Comment on Article by Scutari

Comment on Article by Scutari Bayesian Analysis (2013) 8, Number 3, pp. 543 548 Comment on Article by Scutari Hao Wang Scutari s paper studies properties of the distribution of graphs ppgq. This is an interesting angle because it differs

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Understanding Ding s Apparent Paradox

Understanding Ding s Apparent Paradox Submitted to Statistical Science Understanding Ding s Apparent Paradox Peter M. Aronow and Molly R. Offer-Westort Yale University 1. INTRODUCTION We are grateful for the opportunity to comment on A Paradox

More information

A new resampling method for sampling designs without replacement: the doubled half bootstrap

A new resampling method for sampling designs without replacement: the doubled half bootstrap 1 Published in Computational Statistics 29, issue 5, 1345-1363, 2014 which should be used for any reference to this work A new resampling method for sampling designs without replacement: the doubled half

More information

The New sampling Procedure for Unequal Probability Sampling of Sample Size 2.

The New sampling Procedure for Unequal Probability Sampling of Sample Size 2. . The New sampling Procedure for Unequal Probability Sampling of Sample Size. Introduction :- It is a well known fact that in simple random sampling, the probability selecting the unit at any given draw

More information

MN 400: Research Methods. CHAPTER 7 Sample Design

MN 400: Research Methods. CHAPTER 7 Sample Design MN 400: Research Methods CHAPTER 7 Sample Design 1 Some fundamental terminology Population the entire group of objects about which information is wanted Unit, object any individual member of the population

More information

Indirect Sampling in Case of Asymmetrical Link Structures

Indirect Sampling in Case of Asymmetrical Link Structures Indirect Sampling in Case of Asymmetrical Link Structures Torsten Harms Abstract Estimation in case of indirect sampling as introduced by Huang (1984) and Ernst (1989) and developed by Lavalle (1995) Deville

More information

6.3 How the Associational Criterion Fails

6.3 How the Associational Criterion Fails 6.3. HOW THE ASSOCIATIONAL CRITERION FAILS 271 is randomized. We recall that this probability can be calculated from a causal model M either directly, by simulating the intervention do( = x), or (if P

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

C. J. Skinner Cross-classified sampling: some estimation theory

C. J. Skinner Cross-classified sampling: some estimation theory C. J. Skinner Cross-classified sampling: some estimation theory Article (Accepted version) (Refereed) Original citation: Skinner, C. J. (205) Cross-classified sampling: some estimation theory. Statistics

More information

arxiv: v1 [stat.me] 3 Feb 2017

arxiv: v1 [stat.me] 3 Feb 2017 On randomization-based causal inference for matched-pair factorial designs arxiv:702.00888v [stat.me] 3 Feb 207 Jiannan Lu and Alex Deng Analysis and Experimentation, Microsoft Corporation August 3, 208

More information

Specification Errors, Measurement Errors, Confounding

Specification Errors, Measurement Errors, Confounding Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model

More information

Estimation under cross-classified sampling with application to a childhood survey

Estimation under cross-classified sampling with application to a childhood survey Estimation under cross-classified sampling with application to a childhood survey arxiv:1511.00507v1 [math.st] 2 Nov 2015 Hélène Juillard Guillaume Chauvet Anne Ruiz-Gazen January 11, 2018 Abstract The

More information

SAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures

SAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures SAS/STAT 13.1 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete

More information

Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi)

Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi) Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi) Our immediate goal is to formulate an LLN and a CLT which can be applied to establish sufficient conditions for the consistency

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Exclusion Bias in the Estimation of Peer Effects

Exclusion Bias in the Estimation of Peer Effects Exclusion Bias in the Estimation of Peer Effects Bet Caeyers Marcel Fafchamps January 29, 2016 Abstract This paper formalizes a listed [Guryan et al., 2009] but unproven source of estimation bias in social

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total. Abstract

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total.   Abstract NONLINEAR CALIBRATION 1 Alesandras Pliusas 1 Statistics Lithuania, Institute of Mathematics and Informatics, Lithuania e-mail: Pliusas@tl.mii.lt Abstract The definition of a calibrated estimator of the

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742

Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742 MODEL-ASSISTED WEIGHTING FOR SURVEYS WITH MULTIPLE RESPONSE MODE Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742 Key words: American

More information

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We

More information

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean Confidence Intervals Confidence interval for sample mean The CLT tells us: as the sample size n increases, the sample mean is approximately Normal with mean and standard deviation Thus, we have a standard

More information

A Note on the Effect of Auxiliary Information on the Variance of Cluster Sampling

A Note on the Effect of Auxiliary Information on the Variance of Cluster Sampling Journal of Official Statistics, Vol. 25, No. 3, 2009, pp. 397 404 A Note on the Effect of Auxiliary Information on the Variance of Cluster Sampling Nina Hagesæther 1 and Li-Chun Zhang 1 A model-based synthesis

More information

10.1 The Formal Model

10.1 The Formal Model 67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate

More information

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH Awanis Ku Ishak, PhD SBM Sampling The process of selecting a number of individuals for a study in such a way that the individuals represent the larger

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Taking into account sampling design in DAD. Population SAMPLING DESIGN AND DAD

Taking into account sampling design in DAD. Population SAMPLING DESIGN AND DAD Taking into account sampling design in DAD SAMPLING DESIGN AND DAD With version 4.2 and higher of DAD, the Sampling Design (SD) of the database can be specified in order to calculate the correct asymptotic

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Estimation of Some Proportion in a Clustered Population

Estimation of Some Proportion in a Clustered Population Nonlinear Analysis: Modelling and Control, 2009, Vol. 14, No. 4, 473 487 Estimation of Some Proportion in a Clustered Population D. Krapavicaitė Institute of Mathematics and Informatics Aademijos str.

More information

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals Michael Sherman Department of Statistics, 3143 TAMU, Texas A&M University, College Station, Texas 77843,

More information

ANALYSIS OF SURVEY DATA USING SPSS

ANALYSIS OF SURVEY DATA USING SPSS 11 ANALYSIS OF SURVEY DATA USING SPSS U.C. Sud Indian Agricultural Statistics Research Institute, New Delhi-110012 11.1 INTRODUCTION SPSS version 13.0 has many additional features over the version 12.0.

More information

SAS/STAT 13.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures

SAS/STAT 13.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures SAS/STAT 13.2 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 13.2 User s Guide. The correct bibliographic citation for the complete

More information

Adaptive two-stage sequential double sampling

Adaptive two-stage sequential double sampling Adaptive two-stage sequential double sampling Bardia Panahbehagh Afshin Parvardeh Babak Mohammadi March 4, 208 arxiv:803.04484v [math.st] 2 Mar 208 Abstract In many surveys inexpensive auxiliary variables

More information

Review of probability and statistics 1 / 31

Review of probability and statistics 1 / 31 Review of probability and statistics 1 / 31 2 / 31 Why? This chapter follows Stock and Watson (all graphs are from Stock and Watson). You may as well refer to the appendix in Wooldridge or any other introduction

More information

PPS Sampling with Panel Rotation for Estimating Price Indices on Services

PPS Sampling with Panel Rotation for Estimating Price Indices on Services PPS Sampling with Panel Rotation for Estimating Price Indices on Services 2016 18 Sander Scholtus Arnout van Delden Content 1. Introduction 4 2. SPPI estimation using a PPS panel from a fixed population

More information

Introduction to Statistical Data Analysis Lecture 4: Sampling

Introduction to Statistical Data Analysis Lecture 4: Sampling Introduction to Statistical Data Analysis Lecture 4: Sampling James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis 1 / 30 Introduction

More information

Glossary. Appendix G AAG-SAM APP G

Glossary. Appendix G AAG-SAM APP G Appendix G Glossary Glossary 159 G.1 This glossary summarizes definitions of the terms related to audit sampling used in this guide. It does not contain definitions of common audit terms. Related terms

More information

Figure Figure

Figure Figure Figure 4-12. Equal probability of selection with simple random sampling of equal-sized clusters at first stage and simple random sampling of equal number at second stage. The next sampling approach, shown

More information

EC969: Introduction to Survey Methodology

EC969: Introduction to Survey Methodology EC969: Introduction to Survey Methodology Peter Lynn Tues 1 st : Sample Design Wed nd : Non-response & attrition Tues 8 th : Weighting Focus on implications for analysis What is Sampling? Identify the

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

SAMPLING- Method of Psychology. By- Mrs Neelam Rathee, Dept of Psychology. PGGCG-11, Chandigarh.

SAMPLING- Method of Psychology. By- Mrs Neelam Rathee, Dept of Psychology. PGGCG-11, Chandigarh. By- Mrs Neelam Rathee, Dept of 2 Sampling is that part of statistical practice concerned with the selection of a subset of individual observations within a population of individuals intended to yield some

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

POL 681 Lecture Notes: Statistical Interactions

POL 681 Lecture Notes: Statistical Interactions POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship

More information

Cross-sectional variance estimation for the French Labour Force Survey

Cross-sectional variance estimation for the French Labour Force Survey Survey Research Methods (007 Vol., o., pp. 75-83 ISS 864-336 http://www.surveymethods.org c European Survey Research Association Cross-sectional variance estimation for the French Labour Force Survey Pascal

More information

Log-linear Modelling with Complex Survey Data

Log-linear Modelling with Complex Survey Data Int. Statistical Inst.: Proc. 58th World Statistical Congress, 20, Dublin (Session IPS056) p.02 Log-linear Modelling with Complex Survey Data Chris Sinner University of Southampton United Kingdom Abstract:

More information

SYSTEMATIC SAMPLING FROM FINITE POPULATIONS

SYSTEMATIC SAMPLING FROM FINITE POPULATIONS SYSTEMATIC SAMPLING FROM FINITE POPULATIONS by Llewellyn Reeve Naidoo Submitted in fulfilment of the academic requirements for the degree of Masters of Science in the School of Mathematics, Statistics

More information

New Developments in Nonresponse Adjustment Methods

New Developments in Nonresponse Adjustment Methods New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

REMAINDER LINEAR SYSTEMATIC SAMPLING

REMAINDER LINEAR SYSTEMATIC SAMPLING Sankhyā : The Indian Journal of Statistics 2000, Volume 62, Series B, Pt. 2, pp. 249 256 REMAINDER LINEAR SYSTEMATIC SAMPLING By HORNG-JINH CHANG and KUO-CHUNG HUANG Tamkang University, Taipei SUMMARY.

More information

Background on Coherent Systems

Background on Coherent Systems 2 Background on Coherent Systems 2.1 Basic Ideas We will use the term system quite freely and regularly, even though it will remain an undefined term throughout this monograph. As we all have some experience

More information

New Developments in Econometrics Lecture 9: Stratified Sampling

New Developments in Econometrics Lecture 9: Stratified Sampling New Developments in Econometrics Lecture 9: Stratified Sampling Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. Overview of Stratified Sampling 2. Regression Analysis 3. Clustering and Stratification

More information

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i. Weighting Unconfounded Homework 2 Describe imbalance direction matters STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

arxiv: v1 [stat.me] 16 Jun 2016

arxiv: v1 [stat.me] 16 Jun 2016 Causal Inference in Rebuilding and Extending the Recondite Bridge between Finite Population Sampling and Experimental arxiv:1606.05279v1 [stat.me] 16 Jun 2016 Design Rahul Mukerjee *, Tirthankar Dasgupta

More information

arxiv: v2 [stat.me] 11 Apr 2017

arxiv: v2 [stat.me] 11 Apr 2017 Sampling Designs on Finite Populations with Spreading Control Parameters Yves Tillé, University of Neuchâtel Lionel Qualité, Swiss Federal Office of Statistics and University of Neuchâtel Matthieu Wilhelm,

More information

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore AN EMPIRICAL LIKELIHOOD BASED ESTIMATOR FOR RESPONDENT DRIVEN SAMPLED DATA Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore Mark Handcock, Department

More information

Probability Sampling Designs: Principles for Choice of Design and Balancing

Probability Sampling Designs: Principles for Choice of Design and Balancing Submitted to Statistical Science Probability Sampling Designs: Principles for Choice of Design and Balancing Yves Tillé, Matthieu Wilhelm University of Neuchâtel arxiv:1612.04965v1 [stat.me] 15 Dec 2016

More information

arxiv: v3 [math.oc] 25 Apr 2018

arxiv: v3 [math.oc] 25 Apr 2018 Problem-driven scenario generation: an analytical approach for stochastic programs with tail risk measure Jamie Fairbrother *, Amanda Turner *, and Stein W. Wallace ** * STOR-i Centre for Doctoral Training,

More information

Estimation under cross classified sampling with application to a childhood survey

Estimation under cross classified sampling with application to a childhood survey TSE 659 April 2016 Estimation under cross classified sampling with application to a childhood survey Hélène Juillard, Guillaume Chauvet and Anne Ruiz Gazen Estimation under cross-classified sampling with

More information

Chapter 2: Simple Random Sampling and a Brief Review of Probability

Chapter 2: Simple Random Sampling and a Brief Review of Probability Chapter 2: Simple Random Sampling and a Brief Review of Probability Forest Before the Trees Chapters 2-6 primarily investigate survey analysis. We begin with the basic analyses: Those that differ according

More information

CSCI-6971 Lecture Notes: Monte Carlo integration

CSCI-6971 Lecture Notes: Monte Carlo integration CSCI-6971 Lecture otes: Monte Carlo integration Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu February 21, 2006 1 Overview Consider the following

More information

One-phase estimation techniques

One-phase estimation techniques One-phase estimation techniques Based on Horwitz-Thompson theorem for continuous populations Radim Adolt ÚHÚL Brandýs nad Labem, Czech Republic USEWOOD WG2, Training school in Dublin, 16.-19. September

More information