Simple design-efficient calibration estimators for rejective and high-entropy sampling

Size: px

Start display at page:

Download "Simple design-efficient calibration estimators for rejective and high-entropy sampling"

Griselda Holland
6 years ago
Views:

1 Biometrika (202), 99,, pp. 6 C 202 Biometrika Trust Printed in Great Britain Advance Access publication on 3 July 202 Simple design-efficient calibration estimators for rejective and high-entropy sampling BY Z. TAN Department of Statistics, Rutgers University, Piscataway, New Jersey 08854, U.S.A. ztan@stat.rutgers.edu 5 SUMMARY For survey calibration, consider the situation where the population totals of auxiliary variables are known or where auxiliary variables are measured for all population units. For each situation, we develop design-efficient calibration estimators under rejective or high-entropy sampling. A general approach is to extend efficient estimators for missing-data problems with independent 0 and identically distributed data to the survey setting. We show that this approach effectively resolves two long-standing issues in existing approaches: how to achieve design-efficiency regardless of a linear super-population model in generalized regression and calibration estimation and how to find a simple approximation in optimal regression estimation. Moreover, the proposed approach sheds light on several issues which seem not to be well studied in the literature. 5 Examples include use of the weighted Kullback Liebler distance in calibration estimation, and efficient estimation allowing for misspecification of a nonlinear super-population model. Some key words: Auxiliary information; Calibration; Empirical likelihood; Missing data; Nonparametric likelihood; Poisson sampling; Rejective sampling; Regression estimator.. INTRODUCTION 20 An important topic for survey sampling is calibration with auxiliary information to improve estimation (e.g., Deville & Sarndal, 992). Suppose that a probability sample, s, is drawn from a finite population of N units, U = {,..., i,..., N}. For i U, let Y i be the value of a study variable, X i be that of a vector of auxiliary variables, and R i be the selection indicator such that R i = if i s or R i = 0 otherwise. The value Y i is measured only for i s, whereas 25 X i is measured for i s along with auxiliary information: either X i is known for all i U or i U X i is known. For expository convenience, we assume the first situation, called complete auxiliary information (Wu & Sitter, 200), and consider the second situation as a restricted case such that although i U h(x i) is known for any function h(x), only i U X i is exploited. Then the observed data are {(R i Y i, X i, R i ) : i U}. The objective is to estimate the population 30 mean Ȳ = N i U Y i. We develop design-efficient calibration estimators for rejective and high-entropy sampling. Here design-efficiency is with respect to the sampling design, regardless of any super-population model. A basic idea is to link survey sampling to missing-data problems in the non-survey setting with independent and identically distributed data (e.g., Robins et al., 994). In fact, survey cal- 35 ibration described above is immediately suggestive of a missing-data problem as follows (e.g., Tan, 2006, 200a). Suppose that {(Y i, X i, R i ) : i =,..., N} are independent and identically distributed realizations of (Y, X, R), where Y is a response variable, X is a vector of explanatory variables, and R is the non-missing indicator such that R = or 0 if Y is observed or

2 Z. TAN missing. Then the observed data are {(R i Y i, X i, R i ) : i =,..., N}, of exactly the same form as in survey sampling. Further, assume that the missing-data mechanism is ignorable (Rubin, 976), pr(r = Y, X) = pr(r = X), and that the propensity score (Rosenbaum & Rubin, 983), p(x) = pr(r = X = x), is a known function. The objective is typically to estimate the expectation µ = E(Y ), instead of Ȳ. The preceding discussion shows that survey calibration and the missing-data problem are conceptually similar to each other. In general, there remain important, analytical differences between the two settings in whether (Y, X ),..., (Y N, X N ) are fixed or random and whether (R,..., R N ) are independent of each other. Nevertheless, the probabilistic structure of the foregoing missing-data problem is equivalent to that of Poisson sampling with a nonparametric superpopulation model, where (Y, X ),..., (Y N, X N ) are considered an independent and identically distributed sample from a super-population and (R,..., R N ) are assumed independent given (Y, X ),..., (Y N, X N ). Then Poisson sampling is essentially equivalent to the missing-data problem with (Y, X ),..., (Y N, X N ) fixed or conditioned on. A drawback of Poisson sampling is that the sample size, N R i, is random. Therefore, it is important to consider rejective sampling (Hajek, 964, 98), which is Poisson sampling restricted to or conditioned on a fixed sample size. From these considerations, we propose the following approach: (i) to employ rejective sampling or high-entropy sampling which is asymptotically equivalent to rejective sampling, and (ii) to develop efficient estimation, by constructing efficient estimators for the missing-data problem and then extending those estimators to survey sampling. In this approach, the estimators are transferred from the missing-data problem to the survey setting and then studied fully under the latter probabilistic structure. Rejective sampling is suggested above for its close connection to the missing-data problem. Moreover, there are several reasons why rejective sampling is desirable. First, fast algorithms have recently been proposed for implementing rejective sampling (Chen et al., 994; Tillé, 2006). Second, rejective sampling achieves maximum entropy subject to fixed inclusion probabilities for a fixed sample size (Hajek, 98). High entropy means that all samples have probabilities as uniform as allowed by the constraints, where low entropy indicates that a few samples have disproportionably large probabilities. Third, the structure of rejective sampling as conditional Poisson sampling facilitates the development of simple, asymptotic approximations (Hajek, 964, 98). Finally, asymptotic results can be generalized from rejective sampling to other high-entropy sampling designs, including Rao Sampford sampling, such that the Kullback Liebler divergence from rejective sampling tends to zero (Berger, 988a,b, 2005). Consider linear calibration where only the population totals of auxiliary variables X are exploited. We adopt the calibrated regression and likelihood estimators of Tan (2006, 200a) from the missing-data problem. The resulting estimators take the simple form of generalized regression and calibration estimators (Sarndal et al., 992; Deville & Sarndal, 992), but are asymptotically design-efficient, regardless of a linear, super-population model of Y i given X i, similarly to an optimal regression estimator (Fuller & Isaki, 98; Montanari, 987; Rao, 994). Therefore, our approach effectively resolves two long-standing issues in existing approaches. First, generalized regression and calibration estimation gives a class of simple, consistent estimators, depending on first-order and not second-order inclusion probabilities, but it is unclear how to achieve design-efficiency. Second, the optimal regression estimator is asymptotically designefficient, but cumbersome as it depends on first-order and second-order inclusion probabilities for a general sampling scheme including rejective sampling.

3 Calibration estimators for rejective sampling 3 In the presence of complete auxiliary information, the model-calibration approach of Wu & Sitter (200) consists of two steps, similarly to locally efficient estimation in the missing-data problem (e.g., Robins et al., 994): a possibly nonlinear regression model is fitted for Y i given X i, and then linear calibration is performed using the population total of the fitted values. The 90 resulting estimator of Ȳ is efficient in the sense of minimizing the anticipated asymptotic variance if the regression model is correctly specified (Wu, 2003). However, it is possible to improve efficiency when the regression model may be misspecified. For the second step, linear calibration can be done by applying the design-efficient estimators discussed above. For the first step, the regression model can be estimated to enhance efficiency of the final estimator of Ȳ, similarly 95 similarly as in the method of efficiency maximization in the missing-data literature (Rubin & van der Laan, 2008; Tan, 2008; Cao et al., 2009). Our calibrated estimators, though constructed under rejective sampling, can be adopted and evaluated under a general sampling scheme different from rejective sampling. Under suitable regularity conditions, the estimators remain consistent and asymptotically normal, but may not be 00 design-efficient, similarly to existing generalized regression and calibration estimators (Sarndal et al., 992; Deville & Sarndal, 992). 2. EXISTING METHODS There is a vast literature on calibration methods for incorporating auxiliary information and for handling non-response in survey sampling (e.g., Fuller, 2002; Kott, 2009). We only discuss 05 methods for linear calibration directly related to our approach in 3. Throughout, E r, var r, and cov r denote the expectation, variance, and covariance under the sampling design, with r indicating randomization. The first-order inclusion probabilities are = pr(r i = ). Cassel et al. (976) and Sarndal et al. (992) developed a class of generalized regression estimators of Ȳ : 0 ˆµ GREG = N i s Y i N ( i s X T i i U where ˆβ q = ( i s π i q i X i Xi T) ( i s π i q i X i Y i ) and q i > 0 are weights to be specified. In general, different choices of q i result in different variances. The standard choice is q i = for i s. The choice advocated in Sarndal et al. (992) is q i = σi 2, motivated by a linear model E ε (Y i X i ) = X T i β, var ε (Y i X i ) = σ 2 i. () 5 Throughout, E ε and var ε denote the expectation and variance under a super-population model. The choice q i = σi 2 might be considered optimal in minimizing the variance under model (). But if model () holds, then the anticipated variance of ˆµ GREG, E ε E r {(ˆµ GREG Ȳ )2 }, is approximately N 2 i U (π i )σi 2, regardless of the choice of q i (Wright, 983). Therefore, any choice of q i is asymptotically as efficient as q i = σi 2 if model () holds. It is important to inves- 20 tigate how to choose q i for efficient estimation when model () may be misspecified. Calibration estimation (Deville & Sarndal, 992) gives an even broader class of estimators than ˆµ GREG. Let d i = πi. A calibration estimator of Ȳ is ˆµ cal = N i s w iy i, where w i s are defined to minimize a distance, i s G i(w i, d i ), subject to i s w ix i = i U X i. Three examples of G i (w i, d i ) are (w i d i ) 2 /(2d i q i ), {w i log(w i /d i ) w i + d i }/q i, 25 and { d i log(w i /d i ) + w i d i }/q i, where q i > 0 are weights to be specified. For convenience, we refer to the corresponding distances i s G i(w i, d i ) as the weighted chi-squared, reverse Kullback Liebler, and Kullback Liebler distances. The weighted chi-squared distance X T i ) ˆβ q,

4 Z. TAN leads to weights of form w i = d i ( + q i Xi Tλ). Then ˆµ cal is algebraically identical to ˆµ GREG using the same q i. The weighted reverse Kullback Liebler distance leads to weights of form w i = d i exp(q i Xi Tλ). Then ˆµ cal is called a generalized raking estimator (Deville et al., 993). The weighted Kullback Liebler distance leads to weights of form w i = d i /( + q i Xi T λ). With q i = (i s), if i s w i = N is included as a calibration equation, then minimizing the Kullback Liebler distance is equivalent to maximizing i s d i log(w i ) and hence ˆµ cal coincides with the pseudo empirical likelihood estimator (Chen & Sitter, 999) discussed below. For the two types of Kullback Liebler distances, but not the chi-squared distance, the weights w i are always positive, which is desirable in finite samples. Under suitable regularity conditions, calibration estimators ˆµ cal using different distance measures but the same q i admit the same asymptotic expansion (Deville & Sarndal, 992) ( Y i Xi T ) Xi T β q + o p (n /2 ), (2) N π i s i N π i s i i U where β q = ( i U q ix i Xi T) ( i U q ix i Y i ) and n is the sample size or the expected sample size. This result confirms the foregoing discussion on choices of q i for ˆµ GREG. If model () holds, then β q converges in probability under model () to β and hence the anticipated variance of (2) is asymptotically the same for different choices of q i. Consider another class of estimators of Ȳ, Y i N π i s i N ( i s X T i i U X T i ) b, (3) where b is a vector of arbitrary constants. By expansion (2), this class contains all calibration estimators ˆµ cal up to asymptotic equivalence. The optimal choice of b in minimizing the variance of (3) is β opt = var r ( i s π i X i ) cov r ( i s π i X i, i s π i Y i ). Estimating the variance and covariance matrices in β opt and then substituting the estimator of β opt for b in (3) yields an optimal regression estimator, which is asymptotically equivalent to (3) with b = β opt (Fuller & Isaki, 98; Montanari, 987; Rao, 994). Therefore, the optimal regression estimator asymptotically attains the minimum variance of estimators (3), regardless of model (). On the other hand, this estimator is cumbersome with double summation, except in special cases such as Poisson sampling in 3.2 and stratified simple random sampling. There has been considerable research on comparisons and connections between the generalized regression and calibration approach and the optimal regression approach (e.g., Montanari, 998; Andersson & Thorburn, 2006). Our review is focused on the main features of the two approaches. The first provides simple, but not necessarily design-efficient estimators, whereas the second gives design-efficient but in general complicated estimators. Finally, there are empirical likelihood methods for survey calibration, including those in Hartley & Rao (968) and Chen & Qin (993) for simple random sampling and Chen & Sitter (999) and Kim (2009) for unequal probability sampling. As mentioned above, Chen & Sitter s (999) estimator of Ȳ is ˆµ CS = N i s w iy i, where w i s are defined to maximize the pseudo empirical likelihood i s d i log w i subject to i s w i = N and i s w ix i = i U X i. These methods do not, in general, produce efficient estimators except for simple random sampling or in the case of negligible sampling fractions. A heuristic reason for the lack of efficiency is that such pseudo likelihoods do not capture the probabilistic structure of sampling. In fact, as discussed by Rao (997), design-based likelihood inference for survey data is challenging because the usual likelihood provides no information on {Y i : i s} and hence on Ȳ.

5 Calibration estimators for rejective sampling 5 3. LINEAR CALIBRATION 3. Missing-data problem Consider the missing-data problem described in. To mimic survey sampling with known inclusion probabilities, the propensity score p(x) = pr(r = X = x) is assumed to be a known function. In a real missing-data situation, the function p(x) must be estimated from the data 75 {(X i, R i ) : i =,..., N}. There are interesting issues related to estimated propensity scores and doubly robust estimation against misspecification of propensity score models, which are not addressed here. See, for example, Robins et al. (994) and Tan (2006, 2007, 200a). For the missing-data problem, it is possible to develop efficient estimation using familiar statistical tools for independent and identically distributed data. Let η = p (X)RY and 80 ξ = {p (X)R }X. Then µ = E(η) and 0 = E(ξ). By the method of control variates (Hammersley & Handscomb, 964), a regression estimator of µ is ˆµ reg = N R i Y i p(x i ) N { } p(x i ) Xi T ˆβ, where ˆβ is the least-square estimator in the linear regression of η on ξ: ˆβ = [ N { } 2 p(x i ) X i Xi T ] [ N { } ] R i p(x i ) p(x i ) X i Y i. 85 Alternatively, the method of empirical likelihood (Owen, 200) can be used, by regarding E(ξ) = 0 as a constraint on the joint distribution of (η, ξ). In the Supplementary Material, the empirical likelihood estimator of µ is shown to be ˆµ lik = N η i + ˆλ T ξ i = N R i Y i, (4) ϖ(x i ; ˆλ) where ϖ(x; λ) = p(x) + { p(x)}x T λ and ˆλ is a maximizer of 90 l(λ) = N [ log ϖ(x i ; λ) + ( R i ) log{ ϖ(x i ; λ)} ] subject to ϖ(x i ; λ) > 0 if R i = and ϖ(x i ; λ) < if R i = 0. Setting the gradient of l(λ) to 0 shows that ˆλ is a solution to 0 = [ ] R i ϖ(x i ; λ) N ϖ(x i ; λ){ ϖ(x i ; λ)} { p(x i)}x i. (5) The same estimator ˆµ lik is derived by the method of nonparametric likelihood in Tan (2006) pro- 95 vided that is included in X. In this case, ˆλ satisfies = N N ϖ (X i ; ˆλ)R i in addition to (5), and ˆµ lik and ˆµ reg are invariant under a location transformation (Y, µ) (Y + c, µ + c). See Tan (200b, 20) for further discussions on the relationship between the methods of empirical and nonparametric likelihood in missing-data problems. Neither ˆµ reg nor ˆµ lik is calibrated on X. Throughout, an estimator is said to be calibrated on X 200 if applying the estimator with Y replaced by X yields exactly X = N N X i. Remarkably, there are simple modifications of ˆµ reg and ˆµ lik to achieve calibration on X, without affecting

6 6 Z. TAN first-order asymptotic variances. The calibrated regression estimator of Tan (2006) is µ reg = N R i Y i p(x i ) N { } p(x i ) Xi T β, 205 where β is a coefficient vector different from ˆβ: β = [ N { } ] [ R i p(x i ) p(x i ) N { } ] R X i Xi T i p(x i ) p(x i ) X i Y i. See Rubin & van der Laan (2008) and Tan (2008) for related discussions. The calibrated likelihood estimator of Tan (200a) is µ lik = N R i Y i ϖ(x i ; λ), where ϖ(x; λ) = p(x) + { p(x)}x T λ as before and λ is a solution to 0 = N { } ϖ(x i ; λ) X i, (6) subject to ϖ(x i ; λ) > 0 if R i = for i =,..., N. Equation (6) is obtained by setting { p(x i )}/{ ϖ(x i ; λ)} to in (5). If is included in X, then = N N ϖ (X i ; λ)r i, and µ lik and µ reg are location invariant. It is interesting to mention that calibration on X implies double robustness such that an estimator of µ remains consistent if either the propensity score p(x) or the linear model E(Y X = x) = x T β is correctly specified. In fact, both estimators µ reg and µ lik were originally constructed to achieve this property in Tan (2006, 200a). There are further differences between the calibrated estimators µ reg and µ lik and the noncalibrated estimators ˆµ reg and ˆµ lik in connection with survey sampling. The estimators µ reg and µ lik, unlike ˆµ reg and ˆµ lik, turn out to depend on (X,..., X N ) only through {X i : R i =, i =,..., N} and X, and therefore are applicable as long as {(Y i, X i ) : R i =, i =,..., N} and X are available. This property seems incidental in the missing-data problem, but is crucial in the survey setting where X is known but (X,..., X N ) may be inaccessible. We point out that µ reg and µ lik can be algebraically identified in the class of calibration estimators ˆµ cal including ˆµ GREG for survey sampling, in spite of different arguments behind these estimators. In fact, µ reg is identical to ˆµ GREG and µ lik is identical to ˆµ cal for the weighted Kullback Liebler distance, with d i = p (X i ) and q i = p (X i ). The first relationship follows directly by the formulas of β and ˆβ q. The second relationship follows because equation (6) for λ is equivalent to the calibration equation N R iw i X i = N X i with w i = d i /( + q i Xi Tλ) = [p(x i ) + { p(x i )}Xi Tλ]. These connections shed new light on both the calibrated estimators µ reg and µ lik in the missing-data problem and the calibration estimators ˆµ cal in survey sampling. In particular, the derivation of µ lik by empirical or nonparametric likelihood gives strong support for using the weighted Kullback Liebler distance in ˆµ cal. We summarize asymptotic properties of the foregoing estimators in the missing-data problem (Tan, 2006, 200a). Assume that p(x) > p 0 almost surely for a constant 0 < p 0 <, E(Y 2 ) <, E( X 2 ) <, and V is nonsingular, where X = (X T X) /2 and V = E(ξξ T ) = E[{p (X) }XX T ]. The estimators ˆµ reg, ˆµ lik, µ reg, and µ lik admit the asymptotic

7 expansion N Calibration estimators for rejective sampling 7 R i Y i p(x i ) N { } p(x i ) Xi T β + o p (N /2 ) (7) where β = V U and U = E(ξη) = E[{p (X) }XY ]. Each estimator asymptotically at- 240 tains the minimum variance of estimators of µ in the form similar to (3): R i Y i N p(x i ) { } N p(x i ) Xi T b, (8) where b is a vector of arbitrary constants Poisson sampling Now consider Poisson sampling, where (R,..., R N ) are independent Bernoulli random vari- 245 ables with draw probabilities (p,..., p N ). For clarity, we speak of p i instead of, although = p i, in Poisson sampling. As discussed in, Poisson sampling is essentially equivalent to the missing-data problem with (Y, X ),..., (Y N, X N ) fixed or conditioned on, although there remains the difference that (Y i, X i ) s here may not be a random sample from a probability distribution. The basic equations µ = E(η) and 0 = E(ξ) in the missing-data problem lead to 250 Ȳ = E r (N N p i R i Y i ) and 0 = E r {N N (p i R i )X i } in Poisson sampling. To estimate Ȳ, we apply the calibrated estimators µ reg and µ lik with p(x i ) replaced by p i. The two estimators take the form of calibration estimators ˆµ cal with q i = p i as discussed in 3.. Assume that the expected sampling fraction is asymptotically non-negligible: lim inf N N N p i > 0. Then asymptotic expansion (2) for the two estimators gives 255 R i Y i ( ) Xi T β p + o p (N /2 ), (9) N p i N p i where β p = Vp U p, V p = N N (p i )X i Xi T, and U p = N N (p i )X i Y i. Expansion (9) is identical to (7) in the missing-data problem except for the difference between β p and β. The vector β p equals β opt in 2 because V p = N var r (N N p i R i X i ) and U p = N cov r (N N p i R i X i, N N p i R i Y i ) by independence of (R,..., R N ). 260 Then µ reg and µ lik asymptotically attain the minimum variance of estimators (3). Therefore, µ reg and µ lik are efficient in the class of estimators (3) or (8), whether inference is conditional on (Y i, X i ) s in Poisson sampling or unconditional in the missing-data problem. By the preceding discussion, µ reg coincides with the optimal regression estimator known for Poisson sampling (e.g., Sarndal, 996, Result 9.). It is interesting that µ reg appears more nat- 265 ural than ˆµ reg in Poisson sampling, because N N p i R i (p i )X i Xi T appears a more natural estimator of V p than N N (p i R i ) 2 X i Xi T and hence β appears a more natural estimator of β p than ˆβ. In contrast, ˆµ reg appears more natural than µ reg in the missing-data problem, because ˆµ reg is a direct application of the classical regression estimator but µ reg is derived as a modification of ˆµ reg to achieve calibration on X by Tan (2006) Rejective sampling Consider rejective sampling, which is Poisson sampling restricted to or conditioned on a fixed sample size. By Hajek (964), a sampling design is called rejective sampling of size n with draw probabilities (p,..., p N ) if

8 Z. TAN { c pr(r = r,..., R N = r N ) = i:r i = p i i:r i =0 ( p i), i r i = n, 0, otherwise, where c is a constant. There exist infinitely many choices of (p,..., p N ) leading to the same sampling design. In fact, if p i /( p i ) p i/( p i ) for i =,..., N, then rejective sampling with probabilities (p,..., p N ) is identical to that with probabilities (p,..., p N ). The representation (0) is called canonical if N p i = n. It is important to distinguish draw probabilities p i in the underlying Poisson sampling and inclusion probabilities in the actual rejective sampling. As mentioned in, there are fast algorithms to compute p i from and vice versa and to draw rejective samples (Chen et al., 994; Tillé, 2006). To estimate Ȳ, we include as a component of X i, which is feasible because N = n is known, and apply µ reg and µ lik with p(x i ) replaced by. One of our main objectives is to show that the resulting estimators, denoted by µ reg,π and µ lik,π, are simple and efficient: (i) they are as simple as calibration estimators ˆµ GREG and ˆµ cal with q i = πi ; (ii) they attain the minimum asymptotic variance of estimators (3) and hence are asymptotically as efficient as an optimal regression estimator. Property (i) holds as discussed in 3.. A remarkable byproduct of including in the auxiliary variables X i is that µ reg,π can then be recast in the cosmetic form of linear prediction estimators (Sarndal & Wright, 984): µ reg,π = N = N Xi T β + N Xi T β + N R i (Y i X T i β) R i (Y i Xi T β) = Y i + Xi T N N β. The second equation follows because N N R iπi (πi )(Y i Xi T β)x i = 0 and is 295 a component of X i. There are previous estimators of the linear prediction form in Brewer (979), Fuller & Isaki (98), and Kim & Park (2006) among others. Moreover, the use of as a calibration variable is previously considered by, for example, Wu & Rao (2006). The importance of incorporating design variables such as is discussed by Montanari & Ranalli (2002) for generalized regression estimation. Nevertheless, our approach seems unique in effectively combining the choice q i = πi 300 and the inclusion of in X i. To show property (ii), we develop formal asymptotic theory and provide an interesting interpretation based on conditional inference. We adopt the asymptotic framework without imposing any super-population (e.g., Hajek, 964, 98). A sequence of rejective samples {s (t) } is drawn from a sequence of finite populations {U (t) } of sizes N (t), where N (t) as t. All limiting processes are defined as t or, equivalently, N (t) 305. For simplicity, the index t is suppressed in the notation. Consider the following regularity conditions: (C) min(π,..., π N ) > π 0 for a constant 0 < π 0 < ; (C2) N N Y i 2+γ = O() for a constant γ > 0; (C3) N N X i 2+γ 2 = O() for a constant γ 2 > 0; (C4) lim inf N λ min (V π ) > 0, where V π = N N 30 (π i )X i Xi T and λ min ( ) denotes the smallest eigenvalue. These conditions are directly informative on the finite population and the inclusion probabilities. Recall that is a component of X i. Then condition (C4) implies (C4 ) lim inf N N N ( ) > 0 i s i s (0)

9 Calibration estimators for rejective sampling 9 and hence N ( ), a basic assumption in Hajek s asymptotic theory on rejec- 35 tive sampling. By either (C) or (C4 ), the sampling fraction is asymptotically non-negligible: lim inf N n/n = lim inf N N N > 0. We first give sufficient conditions to apply Hajek s (964) asymptotic results on the Horvitz Thompson estimator ˆµ HT = N N π i R i Y i. LEMMA. If conditions (C), (C2), and (C4 ) hold, then var r (ˆµ HT ) = N v o(n ) and hence ˆµ HT = Ȳ + O p(n /2 ), where v 0 = N N (π i )(Y i β 0 ) 2 and β 0 = { N ( )} { N ( )Y i }. If, further, lim inf N v 0 > 0, then N /2 v /2 0 (ˆµ HT Ȳ ) N(0, ), the standard normal distribution, in distribution. We now state the asymptotic results on the calibrated estimators µ reg,π and µ lik,π. Asymptotic expansion () can be seen from (2) for ˆµ cal, but is shown under weaker and more elementary 325 conditions than in Deville & Sarndal (992). THEOREM. Assume that is a component of X i. If conditions (C) (C4) hold, then µ reg,π and µ lik,π admit the asymptotic expansion R i Y i ( ) Xi T β π + o p (N /2 ), () N N where β π = Vπ U π, U π = N N (π i )X i Y i, and V s defined in (C4). If, further, 330 lim inf N v π > 0, then N /2 vπ /2 ( µ reg,π Ȳ ) N(0, ) and N /2 vπ /2 ( µ lik,π Ȳ ) N(0, ) in distribution, where v π = N (π i )(Y i X T i β π ) 2. The two estimators attain the minimum asymptotic variance of estimators (3). We provide an interpretation of the asymptotic variance v n Theorem based on condi- 335 tional inference: inference from rejective sampling of size n is equivalent to conditional inference from Poisson sampling given that the sample size equals n. Assume the canonical representation (0) with N p i = n. Consider the following estimator µ reg,p : include p i as a component of X i and apply µ reg with p(x i ) replaced by p i, instead of. Then the asymptotic variance of N /2 ( µ reg,p Ȳ ) under rejective sampling is the conditional asymptotic variance given 340 N R i = n under Poisson sampling. But N /2 ( µ reg,p Ȳ ) is asymptotically uncorrelated with N /2 ( N R i n) = N /2 N (R i p i ) under Poisson sampling, with the asymptotic covariance N (p i )(Y i X T i β p )p i = 0, because N N (p i )(Y i βp T X i )X i = 0 and p i is a component of X i. Therefore, 345 N /2 ( µ reg,p Ȳ ) has the same asymptotic variance v p = N (p i )(Y i X T i β p ) 2, under Poisson sampling and under rejective sampling. Finally, N /2 ( µ reg,π Ȳ ) has asymptotic variance v π because µ reg,π = µ reg,p + o p (N /2 ) and v π = v p + o(), by the fact that the relative

10 Z. TAN errors of approximating s by p i s are uniformly of second order: max,...,n ( /p i ) = O(N ) (Hajek 98, Theorem 7.3). The foregoing argument is even useful for understanding Hajek s (964) variance approximation for ˆµ HT in Lemma. In fact, Lemma is a special case of Theorem where X i = is the only auxiliary variable and hence β π = β 0 and v π = v 0. The estimator µ reg,mmediately reduces to ˆµ HT, because N N (R i ) = 0. Moreover, µ lik,π reduces to ˆµ HT, because equation (6) is satisfied by λ = 0. Such an argument is implicitly exploited in Hajek (964). For example, Hajek (964, 3) discussed an important quantity T in his proofs. But T/N is exactly the leading term in expansion (9) with X i = p i under Poisson sampling. Fuller (2009) studied a rejective scheme such that a sample selected by a basic scheme is rejected until i s p i Z i is within a specified distance from i U Z i for an auxiliary vector Z i, where p i is the ith inclusion probability in the basic scheme. Under several regularity conditions, Fuller showed that the asymptotic variance of a particular regression estimator, obtained using p i s, under the rejective scheme is the same as that of the regression estimator under the basic scheme. This result is similar to our result that µ reg,p has the same asymptotic variance under Poisson sampling and under rejective sampling by conditional inference Simple random sampling Consider simple random sampling without replacement as a special case of rejective sampling with π = = π N = n/n. We illustrate how the calibrated estimators µ reg and µ lik and the asymptotic results are related to existing ones. First, consider the simple case where X i = or equivalently X i =. The estimators ˆµ reg, ˆµ lik, µ reg, and µ lik are all reduced to the sample average ȳ = n R iy i. The exact variance of ȳ is n ( n/n)s Y Y, where S Y Y = (N ) N (Y i Ȳ )2. For comparison, the asymptotic variance, N v 0, is N { N ( ) } N n (Y i Ȳ )2 = ( n ) { n N N } (Y i Ȳ )2, which differs from the exact variance by O(N 2 ). The finite population correction factor, n/n, is derived from the weight πi = N/n. Next, consider the general case where X i = (, Zi T)T with non-constant Z i. By simple algebra, the calibrated estimator µ reg, not ˆµ reg, is reduced to the classical, optimal regression estimator (Cochran, 977), ȳ β T( z Z), where z = n N R iz i, Z = N N Z i, and β = { N R iz i (Z i z) T } { N R iz i (Y i ȳ)}. Moreover, µ lik, not ˆµ lik, is reduced to the empirical likelihood estimator of Chen & Qin (993), n N { + γt (Z i Z)} R i Y i, where ˆγ is determined by Z = R i Z i n + ˆγ T (Z i Z). Comparison of this equation with (6) shows that ˆγ = (N/n ) λ and 0 = (N/n )( λ 0 n/n + λ T Z), where λ = ( λ 0, λ T )T. The asymptotic variance of µ reg or µ lik is v π N = ( n ) [ { Yi n N N Ȳ βt (Z i Z) } ] 2, where β = { N Z i(z i Z) T } { N Z i(y i Ȳ )}. As in the simple case of X i =, the finite population correction factor is linked to the weight πi.

11 Calibration estimators for rejective sampling 3 5. High-entropy sampling A sampling design of fixed size n without replacement corresponds to a probability distribu- 390 tion, P (s), on the set S of all possible samples s of n distinct units from U. The entropy of the sampling design P (s) is H(P ) = s S P (s) log{p (s)}. Rejective sampling, denoted by R(s), is the unique sampling design which maximizes the entropy among all fixed-size sampling designs without replacement with fixed first-order inclusion probabilities (Hajek, 98). Then a sampling design with a high entropy is close to rejective sampling. The closeness in approximat- 395 ing P (s) by R(s) can be measured by the Kullback Liebler divergence D(P R) = s S P (s) log P (s) R(s). Berger (988a) studied asymptotic normality of the Horvitz Thompson estimator ˆµ HT for highentropy sampling with the Kullback Liebler divergence tending to zero. By building on his work, we generalize Theorem on the calibrated estimators µ reg,π and µ lik,π from rejective sampling to 400 high-entropy sampling. Recall that conditions (C) (C4) involve only the finite population and the first-order inclusion probabilities. THEOREM 2. Suppose that P (s) is a sampling design with the first-order inclusion probabilities (π,..., π N ). If there exists rejective sampling R(s) such that D(P R) 0 as N, then Theorem holds under P (s). 405 For Rao Sampford sampling P (s) with inclusion probabilities (π,..., π N ), let R(s) be rejective sampling with draw probabilities p i (R) equal to (i =,..., N). Berger (988a) showed that D(P R) 0 if N ( ), which is already implied by condition (C4) because is a component of X i. Therefore, Theorem is directly valid for Rao Sampford sampling. 40 For successive sampling P (s) (Hajek, 98), Berger (988a) showed that there exists rejective sampling R(s) such that D(P R) 0 if n 2 /N 0, which violates lim inf N n/n > 0 under condition (C4 ). It may be of interest to study asymptotic theory for rejective sampling when n/n 0 but n or for successive sampling when n/n 0. Berger (988b, 2005) discussed the Kullback Liebler divergence and the validity of Hajek s 45 (964) variance approximation for other sampling designs including randomized systematic sampling. Further research on this topic is desired. 4. MODEL CALIBRATION Consider the situation of complete auxiliary information, where X i is known for each i U. Then for any function h(x), N h(x i) is known and hence h(x i ) serves as an auxiliary 420 variable for linear calibration as discussed in 3. To use such information, the model-calibration approach of Wu & Sitter (200) consists of two steps. First, a possibly nonlinear regression model is specified: E ε (Y i X i ) = µ(x i ; θ), var ε (Y i X i ) = σ 2 ν(x i ; θ), (2) where θ and σ 2 are unknown parameters and µ( ; θ) and ν( ) are known functions. Let ˆθ be the 425 maximum weighted quasi-likelihood estimator of θ solving 0 = R i µ(x i ; θ) Y µ(x i ; θ), N θ ν(x i ; θ)

12 Z. TAN Second, a calibration estimator, for example, ˆµ GREG is applied with X i replaced by the fitted value µ(x i ; ˆθ) or the vector {, µ(x i ; ˆθ)} T with included. The resulting estimator, ˆµ W S, attains the minimum anticipated asymptotic variance among all calibration estimators ˆµ cal with X i replaced by any function h(x i ) if model (2) is correctly specified (Wu, 2003). We point out two possible ways to improve efficiency while allowing that model (2) may be misspecified. First, the estimator µ reg,π or µ lik,π can be used with X i replaced by {,, µ(x i ; ˆθ)} T or, if is constant, {, µ(x i ; ˆθ)} T in the second step of model calibration. Under additional regularity conditions as in Wu & Sitter (200), Theorem holds with X i replaced by {,, µ(x i ; θ )} T in expansion () and the asymptotic variance v π, where θ is the finite population counterpart of ˆθ solving 0 = N N { µ(x i; θ)/ θ} ν (X i ; θ){y µ(x i ; θ)}. Then regardless of model (2), µ reg,π or µ lik,π attains the minimum asymptotic variance under rejective sampling of estimators (3) or N R i Y i N ( ) {b + b 2 µ(x i ; θ )}, (3) where (b, b 2 ) are arbitrary constants. This class contains, up to asymptotic equivalence, both ˆµ W S and Cassel et al. s (976) generalized difference estimator using µ(x i ; ˆθ) as the calibration variable ˆµ GDIF = N R i Y i N ( ) µ(x i ; π ˆθ). i Therefore, µ reg,π or µ lik,s asymptotically guaranteed to gain efficiency over ˆµ W S and ˆµ GDIF under rejective sampling when model (2) may be misspecified. The estimator ˆµ W S is not necessarily more efficient that ˆµ GDIF, as shown in a simulation study in 5. On the other hand, it can be shown that if model (2) is correctly specified, then µ reg,π, µ lik,π, ˆµ W S, and ˆµ GDIF are asymptotically as efficient as each other under suitable regularity conditions as in Wu (2003). The comparison of µ reg,π and µ lik,π versus ˆµ W S and ˆµ GDIF is similar to that of µ reg versus the estimator of Robins et al. (994) in the missing-data problem discussed in Tan (2006). The second method for improving efficiency involves choosing an alternative estimator of θ for model (2), similarly as in the method of efficiency maximization in the missing-data literature (Rubin & van der Laan, 2008; Tan, 2008; Cao et al., 2009). First, consider the calibrated estimator µ reg,π = µ reg,π (θ) or µ lik,π = µ lik,π (θ) with X i replaced by {,, µ(x i ; θ)} T for fixed θ. By Theorem, the asymptotic variance of each estimator is N v π (θ) under rejective sampling, where v π (θ) = min b v π (θ, b) with b = (b 0, b, b 2 ) T and v π (θ, b) = N (π i ) {Y i b 0 b b 2 µ(x i ; θ)} 2. Recall that β π = β π (θ) in Theorem is a minimizer of v π (θ, b), whereas β = β(θ) in µ reg,s a minimizer of the empirical counterpart of v π (θ, b): ṽ(θ, b) = N R i ( ) {Y i b 0 b b 2 µ(x i ; θ)} 2. The estimators µ reg,π (ˆθ) and µ lik,π (ˆθ) using ˆθ are discussed in the preceding paragraph. Alternatively, consider the estimator µ reg,π ( θ) or µ lik,π ( θ) using θ, where θ is a minimizer of the empirical counterpart of v π (θ), that is, ṽ(θ) = min b ṽ(θ, b) or, equivalently, { θ, β( θ)} is jointly

13 Calibration estimators for rejective sampling 3 a minimizer of ṽ(θ, b). For these estimators, it can be shown that under additional regularity 465 conditions, Theorem holds with X i replaced by {,, µ(x i ; θ )} T in expansion () and the asymptotic variance v π, where θ is a minimizer of v π (θ) or, equivalently, {θ, β π (θ )} is jointly a minimizer of v π (θ, b). Therefore, µ reg,π ( θ) or µ lik,π ( θ) attains the minimum asymptotic variance under rejective sampling of estimators in the form N R i Y i N ( ) {b + b 2 µ(x i ; θ)}, 470 where (b, b 2 ) and θ are arbitrary constants. This class of estimators is larger than (3) in allowing θ different from θ. If model (2) is correctly specified, then θ seems natural, but both θ and θ converge in probability to the true θ so that µ reg,π (ˆθ) and µ lik,π (ˆθ) are asymptotically as efficient as µ reg,π ( θ) and µ lik,π ( θ). If model (2) is misspecified, then the latter estimators using θ are asymptotically more efficient under rejective sampling than the former estimators using ˆθ. 475 The problem of minimizing ṽ(θ, b) is computationally nontrivial. The function µ(x i ; θ) is typically specified as Ψ(θ 0 + Zi Tθ ), where Ψ( ) is an inverse link function, X i = (, Zi T)T, and θ = (θ 0, θ T)T. For θ close to 0, µ(x i ; θ) is approximately Ψ(θ 0 ) and hence linearly dependent with the constant term. Moreover, if µ(x i ; θ) = exp(θ 0 + Zi Tθ ) as in the log-normal model in 5, then exp(θ 0 ) cannot be separately identified from b 2. For simplicity, consider the estima- 480 tor θ minimizing the empirical variance under rejective sampling of the generalized difference estimator ˆµ GDIF (θ), with ˆθ replaced by fixed θ. Equivalently, θ together with some b 0 minimizes ṽ{θ, (b 0, 0, )} = N R i ( ) {Y i b 0 µ(x i ; θ)} 2. By similar reasoning as above, ˆµ GDIF ( θ) attains the minimum asymptotic variance under rejective sampling of estimators ˆµ GDIF (θ) for arbitrary θ. For the calibration estimators ˆµ W S, µ reg,π, 485 and µ lik,π, using the fitted value µ(x i ; θ) may still be more efficient than using µ(x i ; ˆθ) as an auxiliary variable. 5. SIMULATION STUDY We conduct a simulation study to compare various methods in the presence of a misspecified super-population model. Two finite populations of size N = 000 are generated and then fixed to 490 study, respectively, linear calibration and model calibration. For each finite population, 50, 000 Monte Carlo samples of size n = 00 are drawn by rejective sampling using the R package Sampling (Tillé, 20). From each sample, various estimators of Ȳ are then computed. For linear calibration, a finite population is generated as an independent and identically distributed sample from the model: 495 W i χ 2 2, Z i = W i + δ i, (4) Y i = + Z i + {(Z i 3) + } 2 + ε i, (5) where δ i N(0, ), ε i N(0, ), and a + = a if a > 0 or 0 if a 0. The joint distribution of (W i, Z i ) is taken from Kim (2009). The inclusion probabilities are set proportional to W i. The 0%, 25%, 50%, 75%, and 00% quantiles of are , 0 029, 0 069, 0 4, The sampling weights πi are highly variable. The distribution of Y i is specified to reflect a nonlinear, but monotone, relationship between Y i and Z i. We compare the four estimators, ˆµ HT, ˆµ GREG with q i =, µ reg,π, and µ lik,π, in two situations, (I) and (II). For situation (I), ˆµ GREG, µ reg,π,

14 4 Z. TAN Table. Comparison of estimators for linear and model calibration Linear calibration Model calibration Estimator Bias RMSE Bias RMSE Bias RMSE Bias RMSE ˆµ HT Situation I Situation II Situation I Situation II ˆµ GDIF ˆµ GREG µ reg,π µ lik,π RMSE, root mean squared error; all figures are multiplied by, and µ lik,π are applied with the same auxiliary vector X i = (,, Z i ) T. For situation (II), the three estimators are applied with X i = {,, Z i, (Z i 3) + } T. The left panel in Table shows the biases and root mean squared errors of the four estimators of Ȳ = A number of observations can be drawn. First, ˆµ HT has a very large variance, due to high variability of the sampling weights. Second, µ reg,π or µ lik,π has not only a smaller variance but also a smaller absolute bias than ˆµ GREG. The ratio of mean squared errors of ˆµ GREG versus µ reg,π is (409/248) 2 =2 7 in situation (I) or (262/209) 2 = 6 in situation (II). This ratio is greater in situation (I) than in (II), perhaps because a linear model of Y i given X i is a more severe departure from the truth in situation (I) than in (II). Third, µ reg,π and µ lik,π have similar variances to each other, but µ lik,π has a much smaller absolute bias than µ reg,π. In Supplementary Material, we present additional results for linear calibration in two cases where the joint distribution of (Z i, Y i ) remains as before but the variability of the sampling weights πi decreases. The ratios of mean squared errors of ˆµ GREG versus µ reg,π and µ lik,π remain noticeably above, but become smaller as the sampling weights are more homogeneous. Moreover, we report results on variance estimation and confidence interval coverage, using the Horvitz Thompson variance estimator depending on second-order inclusion probabilities and using the empirical counterpart of asymptotic variance in Theorem. Both methods perform reasonably well when sampling weights are moderately or less variable, but lead to under-coverage of confidence intervals in the case of highly variable sampling weights. For model calibration, a finite population is generated as an independent and identically distributed sample from (4) for (W i, Z i ) and (5) replaced by log Y i = Z i 0 2 (Z i 3) ε i, where ε i N(0, ). The distribution of Y i given Z i is log-normal, similarly as in Wu & Sitter (200). Consider the log-linear model (2) for Y i given Z i : µ(z i ; θ) = exp(θ 0 + Zi Tθ ) and ν(z i ; θ) = µ 2 (Z i ; θ), where θ = (θ 0, θ T)T. We compare the five estimators, ˆµ HT, ˆµ GDIF, ˆµ GREG with q i =, ˆµ reg,π, and ˆµ lik,π, in two situations, (I) and (II). For situation (I), ˆµ GREG, ˆµ reg,π, and ˆµ lik,π are applied with X i = {,, µ(z i ; ˆθ)} T, where ˆθ is the maximum weighted quasi-likelihood estimator. For situation (II), the same estimators as in (I), including ˆµ GDIF, are applied, except that µ(z i ; ˆθ) is replaced by µ(z i ; θ) and θ is obtained by efficiency maximization in 4. The right panel of Table shows the biases and root mean squared errors of the five estimators of Ȳ = First, the relative performances of ˆµ GREG, ˆµ reg,π, and ˆµ lik,π are similar to those for linear calibration. Second, the generalized difference estimator ˆµ GDIF has a mean squared error, smaller than that of ˆµ GREG and similar to that of ˆµ reg,π or ˆµ lik,π. This comparison confirms that ˆµ GREG may not be as efficient as ˆµ GDIF, whereas ˆµ reg,π and ˆµ lik,π are asymptotically no less efficient than ˆµ GDIF. The fact that ˆµ reg,π and ˆµ lik,π are roughly as efficient as ˆµ GDIF indicates that, for this example, there is little efficiency to be gained by linearly transforming the fitted values. Third,

15 Calibration estimators for rejective sampling 5 the mean squared errors of ˆµ GDIF, ˆµ reg,π, and ˆµ lik,n situation (II) are similar to those in situation 540 (I). For these estimators, little improvement is possible here from using different estimators of θ to construct fitted values. Nevertheless, ˆµ GREG has a noticeably smaller mean squared error in situation (II) than in (I). 6. CONCLUSION We develop efficient estimators for both linear and model calibration under rejective and high- 545 entropy sampling. A general approach is to extend theory and methods for missing-data problems with independent and identically distributed data to the survey setting. Previous discussions on connections between survey sampling and missing-data problems have shown how calibration estimation in survey sampling is useful for handling missing-data problems (e.g., Kang & Schafer, 2007; Lumley et al., 20). Our work shows, in an opposite direction, how efficient 550 estimators can be extended from missing-data problems to survey sampling. Our development mainly deals with point estimation. For variance estimation, it is straightforward to employ the Horvitz Thompson variance estimator and, without using second-order inclusion probabilities, the empirical counterpart of asymptotic variance in Theorem. Our simulation study suggests that the performances of these methods may deteriorate when sampling 555 weights are highly variable. Further research is warranted to address this issue. ACKNOWLEDGEMENT The author thanks Jae-Kwang Kim and two referees for valuable comments leading to various improvements. This research was supported by the U.S. National Science Foundation. SUPPLEMENTARY MATERIAL 560 Supplementary material available at Biometrika online includes proofs of Lemma and Theorems 2 and additional simulation results mentioned in 5. REFERENCES ANDERSSON, P. G. & THORBURN, D. (2006). An optimal calibration distance leading to the optimal regression estimator. Surv. Methodol. 3, BERGER, Y. G. (988a). Rate of convergence to normal distribution for the Horvitz Thompson estimator. J. Statist. Plann. Inference 67, BERGER, Y. G. (988b). Rate of convergence for asymptotic variance of the Horvitz Thompson estimator. J. Statist. Plann. Inference 74, BERGER, Y. G. (2005). Variance estimation with Chao s sampling scheme. J. Statist. Plann. Inference 27, BREWER, K. R. W. (979). A class of robust sampling designs for large-scale surveys. J. Am. Statist. Assoc. 74, CAO, W., TSIATIS, A. A. & DAVIDIAN, M. (2009). Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika 96, CASSEL, C. M., SARNDAL, C. E. & WRETMAN, J. H. (976). Some results on generalized difference estimation 575 and generalized regression estimation for finite populations. Biometrika 63, CHEN, J. & QIN, J. (993). Empirical likelihood estimation for finite populations and the effective usage of auxiliary information. Biometrika 80, CHEN, J. & SITTER, R. R. (999). A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys. Stat. Sinica. 9, CHEN, S. X., DEMPSTER, A. P. & LIU, J. S. (994). Weighted finite population sampling to maximize entropy. Biometrika 8, COCHRAN, W. G. (977). Sampling Techniques. New York: Wiley, 3rd ed. DEVILLE, J. C. & SARNDAL, C. E. (992). Calibration estimators in survey sampling. J. Am. Statist. Assoc. 87,

16 6 Z. TAN DEVILLE, J. C., SARNDAL, C. E. & SAUTORY, O. (993). Generalized raking procedures in survey sampling. J. Am. Statist. Assoc. 88, FULLER, W. A. (2002). Regression estimation for survey samples. Surv. Methodol. 28, FULLER, W. A. (2009). Some design properties of a rejective sampling procedure. Biometrika 96, FULLER, W. A. & ISAKI, C. T. (98). Survey design under superpopulation models. In Current Topics in Survey Sampling, D. Krewski, J. N. K. Rao & R. Platek, eds. New York: Academic Press, pp HAJEK, J. (964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann. Math. Stat. 35, HAJEK, J. (98). Sampling from a Finite Population. New York: Marcel Dekker. HAMMERSLEY, J. M. & HANDSCOMB, D. C. (964). Monte Carlo Methods. London: Methuen. HARTLEY, H. O. & RAO, J. N. K. (968). A new estimation theory for sample surveys. Biometrika 55, KANG, J. D. Y. & SCHAFER, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data (with discussion). Statist. Sci. 22, KIM, J.-K. (2009). Calibration estimation using empirical likelihood in survey sampling. Stat. Sinica. 9, KIM, J.-K. & PARK, H. (2006). Imputation using response probability. Can. J. Stat. 34, 2. KOTT, P. S. (2009). Calibration weighting: Combining probability samples and linear prediction models. In Handbook of Statistics - Sample Surveys: Inference and Analysis, D. Pfeffermann & C. R. Rao, eds. Netherlands: North- Holland, pp LUMLEY, T., SHAW, P. A. & DAI, J. Y. (20). Connections between survey calibration estimators and semiparametric models for incomplete data. Int. Stat. Rev. 79, MONTANARI, G. E. (987). Post-sampling efficient QR-prediction in large-scale surveys. Int. Stat. Rev. 55, MONTANARI, G. E. (998). On regression estimation of finite population mean. Surv. Methodol. 24, MONTANARI, G. E. & RANALLI, M. G. (2002). Asymptotically efficient generalised regression estimators. J. Offic. Statist. 8, OWEN, A. B. (200). Empirical Likelihood. New York: Chapman & Hall. RAO, J. N. K. (994). Estimating totals and distribution functions using auxiliary information at the estimation stage. J. Offic. Statist. 0, RAO, J. N. K. (997). Developments in survey sampling theory: An appraisal. Can. J. Stat. 25, 2. ROBINS, J. M., ROTNITZKY, A. & ZHAO, L. P. (994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89, ROSENBAUM, P. R. & RUBIN, D. B. (983). The central role of the propensity score in observational studies for causal effects. Biometrika 70, RUBIN, D. B. (976). Inference and missing data. Biometrika 63, RUBIN, D. B. & VAN DER LAAN, M. J. (2008). Empirical efficiency maximization: Improved locally efficient covariate adjustment in randomized experiments and survival analysis. Int. J. Biostat. 4, Article 5. SARNDAL, C. E. (996). Efficient estimators with simple variance in unequal probability sampling. J. Am. Statist. Assoc. 9, SARNDAL, C. E., SWENSSON, B. & WRETMAN, J. H. (992). Model Assisted Survey Sampling. New York: Springer. SARNDAL, C. E. & WRIGHT, R. L. (984). Cosmetic form of estimators in survey sampling. Scand. J. Stat., TAN, Z. (2006). A distributional approach for causal inference using propensity scores. J. Amer. Statist. Assoc 0, TAN, Z. (2007). Comment: Understanding OR, PS, and DR. Statist. Sci. 22, TAN, Z. (2008). Comment: Improved local efficiency and double robustness. Int. J. Biostat. 4, Article 0. TAN, Z. (200a). Bounded, efficient, and doubly robust estimation with inverse weighting. Biometrika 97, TAN, Z. (200b). Nonparametric likelihood and doubly robust estimating equations for marginal and nested structural models. Can. J. Stat. 38, TAN, Z. (20). Efficient restricted estimators for conditional mean models with missing data. Biometrika 98, TILLÉ, Y. (2006). Sampling Algorithms. New York: Springer. TILLÉ, Y. (20). Sampling: Survey Sampling. R package 2.4, WRIGHT, R. L. (983). Finite population sampling with multivariate auxiliary information. J. Amer. Statist. Assoc 78, WU, C. (2003). Optimal calibration estimators in survey sampling. Biometrika 90, WU, C. & RAO, J. N. K. (2006). Pseudo-empirical likelihood ratio confidence intervals for complex surveys. Can. J. Stat. 34, WU, C. & SITTER, R. R. (200). A model-calibration approach to using complete auxiliary information from survey data. J. Amer. Statist. Assoc 96, [Received April 202. Revised September 202]

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys

The Canadian Journal of Statistics Vol.??, No.?,????, Pages???-??? La revue canadienne de statistique Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys Zhiqiang TAN 1 and Changbao