Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys

Size: px

Start display at page:

Download "Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys"

Imogene Warner
5 years ago
Views:

1 The Canadian Journal of Statistics Vol.??, No.?,????, Pages???-??? La revue canadienne de statistique Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys Zhiqiang TAN 1 and Changbao WU 2 1 Department of Statistics, Rutgers University, Piscataway, NJ 08854, USA 2 Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON N2L 3G1, CANADA Key words and phrases: Auxiliary information; calibration techniques; confidence intervals; Kullback-Leibler distance; survey design. MSC 2000 : Primary 62D05; secondary 62G09. Abstract: We consider generalized pseudo empirical likelihood inferences for complex surveys. The method is based on a weighted version of the Kullback-Leibler (KL) distance for calibration estimation (Deville and Särndal, 1992) and includes the pseudo empirical likelihood estimator (Chen and Sitter, 1999; Wu and Rao, 2006) and the calibrated likelihood estimator (Tan, 2013) as special cases. We show that a suitably formulated empirical likelihood ratio-type statistic follows asymptotically a scaled chisquare distribution, which extends the main result in Wu and Rao (2006) and makes the likelihood ratio-type confidence intervals available for calibration estimation using arbitrary choices of the weighting factor in the weighted KL distance. We further show that the scaling factor for the scaled chisquare distribution can be circumvented either through a particular choice of the weighting factor for the KL distance or using a bootstrap method. The proposed bootstrap procedure is justified for single-stage sampling designs with negligible sampling fractions. Finite sample performances of confidence intervals constructed using our proposed methods are investigated and compared with existing ones through two simulation studies. 1

2 1. INTRODUCTION Calibration is a popular inference tool for analysis of complex surveys. It originates from the idea of benchmarking when population totals of certain auxiliary variables are known and used to form benchmark constraints. The method has gained significant popularity since the work of Deville and Särndal (1992). Calibration estimators are closely related to the generalized regression estimators. Some of the important developments for regression estimators, such as variance estimation techniques, can also be used for calibration estimators. Fuller (2002) provides an excellent review on regression estimation and Särndal (2007) contains a thorough review on calibration techniques. The conventional route for inferences with calibration methods is to first compute the point estimator with an estimated variance, and then use the standard Z-statistic based on normal approximations to construct confidence intervals or conduct statistical tests. Confidence intervals under this approach are forced to be symmetric around the point estimator and are not necessarily confined within the parameter space. There have been significant developments on empirical likelihood (EL) methods in non-survey statistics (Owen, 2001). Progress has also been made on using the empirical likelihood method for complex surveys. See, for instance, Chen and Sitter (1999), Wu and Rao (2006), Chen and Kim (2014), among others. The most attractive feature of the empirical likelihood approach is the data-driven, range-respecting confidence intervals based on the empirical likelihood ratio statistic. This property is not enjoyed by the conventional calibration method with Wald-type confidence intervals. Suppose that U = {1, 2,, N} is the set of N units for the finite population, with (y i, x i ) being the values of the study variable y and the vector of auxiliary variables x attached to unit i. Let t y = N i=1 y i be the parameter of interest, and t x = N i=1 x i be the known population totals. Let µ y = N 1 t y and µ x = N 1 t x be the corresponding population means. Let S be a set of n sampled units and 2

3 {(y i, x i ) : i S} be the survey data. Let π i = P (i S) be the first order inclusion probabilities and d i = π 1 i be the basic design weights. The calibration estimator of t y is computed as ˆt CAL = i S ŵiy i, where the ŵ i are the calibrated weights obtained by minimizing a distance measure G(d, w) = i S G i (d i, w i ) between the w i and the d i subject to the set of calibration equations (also called benchmark constraints) w i x i = t x. (1) i S The most commonly used distance measure is the chi-squared distance specified by G i (d i, w i ) = (w i d i ) 2 /(q i d i ), where the q i are the pre-specified constants. It is well known that, under the chisquared distance, the resulting calibration estimator ˆt CAL is algebraically identical to the generalized regression estimator (Särndal et al., 1992). Under the conventional calibration approach (Deville and Särndal, 1992), confidence intervals on t y reply on asymptotic normality of ˆt CAL and are constructed through the standardized Z- statistics (ˆt ) { CAL t y / v(ˆt CAL ) } 1/2, where v(ˆt CAL ) is a consistent estimator of the variance of ˆt CAL. The role of q i used in the distance measure does not affect the consistency of the calibration estimator but it has an impact on the variance of the estimator. The choice q i = π 1 i 1 can lead to more efficient calibration estimators (Tan 2010, 2013). A closely related research topic is the design-optimal regression estimator; see, for instance, Fuller and Isaki (1981), Montanari (1987), Rao (1994), Berger et al. (2003), Chen and Kim (2014), among others. Fuller (2009) considered modeloptimal design-consistent estimators. For complex survey data, Chen and Sitter (1999) proposed to use a pseudo empirical (log) likelihood l(p) = i S d i log(p i ), (2) 3

4 where p = (p 1,, p n ) T is the discrete probability measure over the n sampled units. The maximum pseudo-el estimator of the population mean µ y is computed as ˆµ PEL = i S ˆp iy i, where the ˆp i maximize the pseudo-el function l(p) subject to i S p i = 1 and the set of constraints p i x i = µ x. (3) i S Chen and Sitter (1999) showed that the estimator ˆµ PEL is asymptotically equivalent to the calibration estimator N 1ˆt CAL with the choice of q i = 1. Wu and Rao (2006) showed that the pseudo-el ratio statistic on µ y, adjusted by a scaling factor involving the design effect, has an asymptotic χ 2 distribution with one degree of freedom. Consequently, confidence intervals on µ y based on the pseudo-el ratio statistic can be constructed. There are two major gaps between the conventional calibration method and the pseudo empirical likelihood method. First, the pseudo-el method is designed for the population mean µ y and relies on the constraints (3) which uses the known population means µ x. The method cannot be directly applied to scenarios where the population size N is unknown and only the population totals t x are available. Second, the pseudo-el approach cannot entertain a general choice of the weight factor, q i, which is an important tool for achieving design-optimal estimation as mentioned earlier. Recently, Tan (2010, 2013) developed a calibrated likelihood method by exploiting a connection between survey calibration and missing data problems. The method can be understood in two steps. Let R i be the sample inclusion indicator, i.e., R i = 1 if i S and R i = 0 otherwise. The first step is to treat {(x i, y i, R i ) : i U} as an iid sample from a joint distribution of (x, y, R) and derive a bona fide empirical likelihood estimator (Qin and Lawless, 1994) for E(y) = E{π 1 (x)ry}, subject to the moment constraints 0 = E{π 1 (x)rx x}, based on the observed data {(x i, R i y i, R i ) : i U}, where π(x) = P (R = 1 x, y) is assumed to be free of y. This empirical likelihood estimator of E(y) takes the usual form N 1 i S ŵiy i for some weights ŵ i, but the calibration equations are generally violated, i.e., i S ŵix i t x. 4

5 For the second step, Tan (2010) proposed a modification such that the calibration equations are satisfied but without affecting the first-order asymptotic variance. As observed in Tan (2013), the calibrated likelihood estimator turns out to be algebraically equivalent to ˆt CAL distance and the weight factor q i set to π 1 i 1. with D(w, d) specified as a weighted Kullback-Leibler The calibrated likelihood estimator (Tan 2010) and, similarly, the calibrated regression estimator (Tan 2006) are shown in Tan (2013) to be asymptotically optimal under rejective or high-entropy sampling designs when π i is included as a calibration variable in x i. These two estimators are simpler than the usual optimal regression estimator involving second-order inclusion probabilities (Fuller and Isaki 1981; Montanari 1987; Rao 1994). See also Chen and Kim (2014) for related results but under negligible sampling fractions. In this paper, we consider generalized pseudo empirical likelihood inferences for complex surveys. The method is based on a weighted version of the Kullback-Leibler (KL) distance for calibration estimation (Deville and Särndal, 1992) and includes the pseudo empirical likelihood estimator (Chen and Sitter, 1999; Wu and Rao, 2006) and the calibrated likelihood estimator (Tan, 2013) as special cases. We show that a suitably formulated empirical likelihood ratio-type statistic follows asymptotically a scaled chisquare distribution, which extends the main result in Wu and Rao (2006) and makes the likelihood ratio-type confidence intervals available for calibration estimation using arbitrary choices of the weighting factor in the weighted KL distance. We further show that the scaling factor for the scaled chisquare distribution can be circumvented either through a particular choice of the weighting factor for the KL distance or using a bootstrap method. The proposed bootstrap procedure is justified for single-stage sampling designs with negligible sampling fractions. The rest of the paper is organized as follows. Main results on the generalized pseudo empirical likelihood method are presented in Section 2. The proposed bootstrap procedure is described in Section 3. In Section 4, we report results from two simulation studies, one based on a synthetic finite population and the other using 5

6 a Statistics Canada survey data set. Some concluding remarks and discussions are given in Section 5. Proofs of the major results and justification of the bootstrap method are given in the Appendix. 2. GENERALIZED PSEUDO EMPIRICAL LIKELIHOOD METHOD 2.1. Weighted Kullback-Leibler Distance Based Calibration Estimators Kullback-Leibler distance is a measure of divergence between two distributions. It was first described by Kullback and Leibler (1951) as a loss function in the context of information theory, and then further discussed by Kullback (1959). For two discrete probability measures f = (f 1,, f n ) T and g = (g 1,, g n ) T, there are two types of Kullback-Leibler distance: KL(f, g) = n i=1 f i log(f i /g i ) and KL(g, f) = n i=1 g i log(g i /f i ). When discussing confidence intervals for iid data, taking f to be the empirical measure with f i = 1/n and g i to be another probability measure for the data, DiCiccio and Romano (1990) called KL(f, g) the forward Kullback-Leibler distance and KL(g, f) the backward Kullback-Leibler distance. Unfortunately, neither KL(f, g) nor KL(g, f) can be used directly as a distance measure for calibration estimation, since G i (f i, g i ) = f i log(f i /g i ) does not guarantee that G i (f i, g i ) 0 for all i. A simple modification is to use G i (f i, g i ) = f i log(f i /g i ) f i + g i. In this case G i (f i, g i ) = f i {log(g i /f i ) g i /f i + 1} 0 for all i, since log(x) x for any x > 0. For the two sets of weights d = (d 1,, d n ) T and w = (w 1,, w n ) T, we consider the modified forward Kullback-Leibler distance between w and d, weighted by (q 1,, q n ): EL(d, w) = i S { ( q 1 wi ) } i d i log w i + d i. d i This is also called the minimum entropy distance by Deville and Särndal (1992). The notation EL(d, w) indicates its connection to empirical likelihood. For independent but not identically distributed data where d i = n 1 and (w 1,, w n ) are replaced by the probability measure (p 1,, p n ) over the sample, EL(d, w) was discussed in Wu (2004) as the weighted empirical log-likelihood function, with the q-weights specified 6

7 through the variance function. If we let q i = 1 and impose the constraint n i=1 w i = N, then EL(d, w) = n i=1 d i log(p i )+C, where p i = w i /N and C is a constant not involving p i. In this case, minimizing EL(d, w) subject to a set of constraints on w i is equivalent to maximizing the pseudo-el function l(p) = n i=1 d i log(p i ) subject to the same set of constraints on p i. In other words, the pseudo-el approach of Chen and Sitter (1999) and Wu and Rao (2006) is a special case of inferences based on the modified forward Kullback-Leibler distance EL(d, w). We use the term generalized pseudo empirical likelihood (GPEL) to denote calibration estimation under the distance EL(d, w). The GPEL estimator of t y is given by ˆt EL = i S ŵiy i, where the weights ŵ i minimize EL(d, w) subject to (1). If N is known, then the constraint i S w i = N should be included. This amounts to including 1 as the first component of x i and N as the first component of t x in the calibration equations (1). It can be shown by the standard Lagrange multiplier method that where ˆλ is a solution to i S ŵ i = d i, (4) 1 + q i x T i ˆλ d i x i 1 + q i x T i ˆλ t x = 0. (5) It should be noted that the modified backward Kullback-Leibler distance between w and d, weighted by the pre-specified q-weights (q 1,, q n ), is given by: ET(d, w) = { ( q 1 wi ) } i w i log w i + d i. d i i S The notation ET comes from the term exponential tilting, since minimizing ET(d, w) with respect to w subject to constraints (1) results in calibration weights given by w i = d i g i, where g i = exp ( λ x i q i ) and λ is determined by constraints (1). The distance measure ET(d, w) was first mentioned by Deville and Särndal (1992). Folsom (1991) provides an early example on exponential weight adjustment. Kim (2010) contains further discussions on calibration estimation using exponential titling. Now consider the choice q i = π 1 i 1 (Tan 2010, 2013). The distance EL(d, w) is equal to i S (1 π i) 1 {log(w i ) π i w i } up to an additive constant. The resulting 7

8 calibration weights are given by ŵ i = {π i + (1 π i )x T i ˆλ} 1, where ˆλ is the solution to i S x i/{π i + (1 π i )x T i ˆλ} t x = 0. The resulting calibration estimator of µ y is given by ˆµ EL = 1 N i S y i, (6) π i + (1 π i )x T i ˆλ which is exactly the same as the calibrated likelihood estimator of Tan (2010, 2013). The use of π 1 i 1 as a weight also appeared previously in Brewer (1999) on cosmetic calibration and Berger et al. (2003) on optimal regression estimation. See further discussions after Corollary 1. An interesting interpretation of the choice q i = π 1 i 1 is as follows. Let I i = 1 if i S and I i = 0 if i / S, then E p (I i /π i ) = 1 and V p (I i /π i ) = π 1 i 1. Throughout, E p ( ) and V p ( ) refer to expectation and variance under the probability sampling design. In other words, the choice q i = π 1 i 1 reflects the variation of selecting the ith unit into the sample under the survey design. Another benefit of setting q i = π 1 i 1 can be seen from the property that q i 0 if π i 1. If the inclusion probability of a unit is close to 1, then this unit is substantially down-weighted (or completely removed if π i = 1) in the calibration process. This seems to be sensible from a design perspective, because the uncertainty associated with unit i is very small if π i 1, and in this case we should force w i d i 1. In particular, this property may lead to substantial variance reduction, when the linear relationship of y i given x i is violated mostly in the region where π i 1, as seen in Tan (2013, Section 5) Generalized Pseudo Empirical Likelihood Ratio Confidence Intervals The point estimator ˆt EL falls in the general class of calibration estimators (Deville and Särndal, 1992), with the distance measure G(d, w) specified as the modified Kullback-Liebler distance EL(d, w). We now establish an important new result for constructing confidence intervals based on a GPEL ratio statistic similar to the pseudo empirical likelihood ratio statistic in Wu and Rao (2006). We assume that the finite population and the survey design satisfy the same regularity conditions 8

9 C1-C5 described in Wu and Rao (2006). In addition, we assume that C6. The q-weights satisfy N 1 N i=1 q2 i = O(1). Under conditions C1-C6, we have that N 1 N i=1 q ix i x T i = O(1) and N 1 N i=1 q ix i y i = O(1). Let ŵ = (ŵ 1,, ŵ n ), where ŵ i are computed by (4) and (5). By standard asymptotic theory of calibration estimation (Deville and Särndal, 1992), we have ˆλ = O p (n 1/2 ) where ( ) 1 ) ( ˆλ = d i q i x i x T i (ˆt x t x + o ) p n 1/2, (7) i S and ˆt x = i S d ix i. This leads to the following asymptotic expansion: ˆt EL = i S ŵiy i = ˆt GREG + o p (Nn 1/2 ), where ˆt GREG = ˆt y + ˆB T( t x ˆt x ), (8) with ˆt y = i S d iy i and ( ) 1 ˆB = d i q i x i x T i d i q i x i y i. (9) i S i S The estimator (8) is known as the generalized regression estimator for a general choice q i (Särndal et al., 1992). Let w(θ) = ( w 1 (θ),, w n (θ)), where the weights w i (θ) minimize EL(d, w) subject to (1) and w i y i = θ (10) for a given θ. The GPEL ratio statistic for θ is defined as i S r(θ) = EL(d, ŵ) EL(d, w(θ)). We have the following result on the asymptotic distribution of r(θ). Theorem 1. Under regularity conditions C1-C6, the adjusted GPEL ratio statistic 2r(θ)/C converges in distribution to a χ 2 random variable with one degree of 9

10 freedom when θ = t y. The scaling constant C is given by ( N ) C = V p (ˆη)/ q i e 2 i, (11) i=1 where ˆη = i S d ie i, e i = y i B T x i, and B = ( N which can be consistently estimated by ˆB defined in (9). i=1 q ix i x T i ) 1 ( N i=1 q ix i y i ), In practice, the scaling factor C needs to be estimated by a consistent estimator Ĉ, which involves variance estimation for ˆη. This can be handled similarly as in Wu & Rao (2006). A bootstrap procedure described in Section 3 can also be used to circumvent the estimation of C for single-stage sampling designs with negligible sampling fractions. For rejective or high-entropy sampling designs, estimation of C is also not required, as shown in Corollaries 1 and 2 below. An interesting special case of Theorem 1 is obtained for the calibrated likelihood estimator (6) of Tan (2013) under rejective sampling, using the weight factor q i = π 1 i 1 and including π i as a calibration variable in x i. As defined in Hajek (1964), rejective sampling is Poisson sampling conditional on a fixed sample size. For example, simple random sampling without replacement corresponds to rejective sampling with constant inclusion probabilities. In this case the scaling constant C is asymptotically equal to 1. We assume that lim inf N N 1 N i=1 (π 1 i 1)e 2 i > 0. Corollary 1. Let q i = π 1 i 1 and assume that π i is included as a component of x i. Under rejective sampling and the regularity conditions stated in Theorem 1 of Tan (2013), we have lim N C = 1 and 2r(θ) converges in distribution to a χ 2 random variable with one degree of freedom when θ = t y. A heuristic explanation for the simplification of C is as follows; see also Tan (2013). Under Poisson sampling, V p (ˆη) = N i=1 (π 1 i 1)e 2 i and hence C is exactly equal to 1. But rejective sampling of size n is defined as Poisson sampling conditional on a fixed sample size n, that is, i S d iπ i = n. Hence under rejective sampling, V p (ˆη) is asymptotically equal to the residual variance N i=1 (π 1 i 1)(e i bπ i ) 2, { N } 1 { with b = i=1 π N i(1 π i ) i=1 (1 π i)e i }, which then reduces to 0 by the 10

11 y i + i S definition of e i and the fact that π i is included as a component of x i. Such an argument is also implicit in Berger et al. (2003) on optimal regression estimation. In fact, under single-stage rejective sampling, the estimator of Berger et al. reduces to the same estimator, up to some minor difference, as the calibrated regression estimator of Tan (2013), taking the form of ˆt GREG in (8) with q i = π 1 i 1. Incidentally, when both using q i = π 1 i 1 and including π i in x i, the calibrated regression estimator can also be expressed in the cosmetic form of linear prediction estimators (Särndal & Wright, 1984), i.e., ˆt GREG = i S ˆB T x i, as shown in Tan (2013). Fuller (2009) and Park and Kim (2014) also contain discussions on the topic. These choices of q i and x i satisfy a general construction of cosmetic calibration estimators in Brewer (1999). In Brewer s notation, Z s is taken here to be diagonal with diagonal elements π i. The condition Z s 1 n = X s α holds because π i s constitute a column of X s. But, in general, Brewer s (1999) proposal does not imply setting q i = π 1 i 1. Similarly as in Tan (2013), Corollary 1 can be generalized from rejective sampling to other high-entropy sampling methods such that the Kullback-Leibler divergence from rejective sampling tends to 0. In particular, Rao-Sampford sampling method (Rao, 1965; Sampford, 1967) is an example of high-entropy sampling provided that n i=1 π i(1 π i ) (Berger, 1998), which is already implied by the regularity conditions in Tan (2013, Theorem 1). Corollary 2. The same result as in Corollary 1 holds if rejective sampling procedure is replaced by the Rao-Sampford sampling procedure Computational Procedures for GPEL The basic computational problem is to minimize EL(d, w) with respect to w = (w 1,, w n ) subject to constraint (1). The resulting ŵ i is given by (4) with the Lagrange multiplier ˆλ being the solution to (5). The key to our computational algorithms is that the required constrained minimization with respect to w is a dual 11

12 problem of maximizing K(λ) = i S q 1 i d i log ( 1 + q i x T i λ ) t T xλ with respect to λ within the set Ω(λ) = {λ : 1 + q i x T i λ > 0, i S}, since (5) is equivalent to 1 ( λ ) = λ K(λ) = i S Note that K(λ) is a concave function of λ, since the matrix 2 ( λ ) = 2 λ λ T K(λ) = i S d i x i 1 + q i x T i ˆλ t x = 0. (12) d i q i x i x T i (1 + q i x T i λ)2 (13) is negative definite. This duality property was also observed in Tan (2010) for the calibrated likelihood estimator, where, up to an additive constant, K(λ) = i S log{π i + (1 π i )x T i λ} 1 π i t T xλ. The solution to (5) can be found using the modified Newton-Raphson procedures of Chen et al. (2002) with 1 (λ) and 2 (λ) defined in (12) and (13). 3. A BOOTSTRAP PROCEDURE Results from Theorem 1 can be used to construct 1 α level confidence intervals for the population total θ = t y in the form of {θ 2r(θ)/C < χ 2 1(α)}, where χ 2 1(α) is the upper 100αth percentile from the χ 2 1 distribution. Under an arbitrary unequal probability sampling design, the scaling constant C needs to be estimated, which involves variance estimation for ˆη. For single-stage unequal probability sampling designs with negligible sampling fractions, the scaling constant can be circumvented through a bootstrap calibration method. The bootstrap procedure also provides a useful alternative to the chisquare approximation for rejective or high-entropy sampling under Corollaries 1 and 2 where C can be replaced by Ĉ = 1. The bootstrap procedure introduced here is similar to the with-replacement bootstrap procedure described in Wu and Rao (2010) for the pseudo empirical likelihood method. The bootstrap calibrated 1 α level confidence intervals on θ = t y using the 12

13 unscaled GPEL ratio statistic is constructed as {θ 2r(θ) < b α }, where b α is the upper 100αth percentile from the sampling distribution of 2r(θ). The bootstrap procedure provides a Monte Carlo approximation to b α. The most crucial part of the bootstrap method is to treat the survey weights d i and the q-weights q i as part of the sample data. Let {(d i, q i, x i, y i ), i S} be the original survey data set. Let t x be the known population totals for the x-variables and let ˆt EL = i S ŵiy i be the calibration estimator of t y using the distance measure EL(d, w). Our proposed bootstrap method consists of the following four steps: [1] Select a bootstrap sample S of size n from the original sample S using simple random sampling with replacement; denote the bootstrap sample data by {(d i, q i, x i, y i ), i S }. [2] Let the bootstrap version of EL(d, w) be defined as EL (d, w) = i S (q i ) 1 {d i log ( wi d i ) } w i + d i. [3] Calculate the GPEL ratio statistic r (θ) = EL (d, ŵ) EL (d, w(θ)) at θ = ˆt EL, where ŵ = (ŵ 1,, ŵ n ) T maximize EL (d, w) subject to i S w i x i = t x and w(θ) = ( w 1 (θ),, w n (θ)) T maximize EL (d, w) subject to i S w i x i = t x and i S w i yi = ˆt EL. [4] Repeat Steps [1], [2] and [3] a large number of times, B, independently, to obtain the sequence 2r 1(θ),, 2r B(θ), all at θ = ˆt EL. Let b α be the upper 100αth sample percentile from this sequence. The proposed bootstrap method can be formally justified for single-stage unequal probability sampling design with replacement; see the Appendix for details. The procedure also provides good approximations for single-stage unequal probability sampling designs without replacement if the sampling fraction is small. Treating survey designs with negligible sampling fractions as if the units are selected with replacement is a common practice in survey sampling for the purpose of variance 13

14 estimation or other second order analysis. The bootstrap calibrated confidence interval on t y, constructed as {θ 2r(θ) < b α}, has approximately correct asymptotic coverage probability at the 1 α level. 4. SIMULATION STUDIES We now report results from two simulation studies on the performances of the GPEL based estimators and GPEL ratio confidence intervals on a population total, with comparisons to the generalized regression estimators and the usual normal theory confidence intervals. Study I. The finite population of size N = 2000 used for the simulation was generated from the model y i = β x i + 2z i + 0.5{x i I(x i < 2)} z 1/2 i + σε i, where x i lognormal(0, 1), z i χ 2 2, ε i N(0, 1), I( ) is the indicator function, β 0 was chosen such that y i 0 for i = 1,, N. Two values of σ were used such that the correlation coefficients, ρ, between the response variable y i and the linear predictor β x i + 2z i + 0.5{x i I(x i < 2)} z 1/2 i are 0.80 and 0.50, respectively. The finite population, once generated from the above model, was held fixed. Under this setting, the finite population correlation coefficients between y and x are respectively 0.46 and 0.30 for ρ = 0.80 and ρ = 0.50; the correlation coefficients between y and z are respectively 0.66 and 0.43 for the two corresponding values of ρ. Single-stage unequal probability samples of size n = 80 were taken from the finite population, with inclusion probabilities π i proportional to z i + c. Two values of c were considered such that π mm = π max /π min equals 200 and 20, respectively, where π max = max{π i, i = 1,, N} and π min = min{π i, i = 1,, N}. Rao-Sampford unequal probability sampling method (Rao, 1965; Sampford, 1967) was used in selecting the samples. Note that the sampling fraction is 80/2000 = 4%, which is small, and the Rao-Sampford method has high entropy (Berger, 1998). It should also 14

15 be noted that the second-order inclusion probabilities π ij can be computed exactly for the Rao-Sampford sampling method. We considered two choices of q-weights: q i = 1 and q i = π 1 i 1. This gives a total of eight different scenarios with respect to the choices on ρ, π mm and q i. For each scenario, five point estimators of the population total t y were computed: (1) the basic Horvitz-Thompson estimator (HT); (2) the generalized regression estimator calibrated over x i (GREG-1); (3) the generalized regression estimator calibrated over (x i, π i, 1) (GREG-2); (4) the GPEL based estimator calibrated over x i (GPEL- 1); and (5) the GPEL based estimator calibrated over (x i, π i, 1) (GPEL-2). Performances of a point estimator ˆt y of the population total t y are evaluated in terms of simulated Relative Bias (RB) and Relative Root Mean Square Error (RRMSE) defined as RB = K 1 K k=1 } {ˆt y (k) t y /t y and RRMSE = ( MSE) 1/2/ty, where ˆt y (k) is the estimator computed from the kth simulated sample, MSE = K 1 K k=1 {ˆt y (k) t y } 2, and K is the total number of simulation runs. All five estimators demonstrated negligible biases ( RB < 3% for all cases). Details are not included here to save space. The simulated values of RRM SE are summarized in Table 1. The results for the Horvitz-Thompson estimator are reported from two independent simulations for the two choices of q-weights. can be seen from the table that (i) the design with less variable weights (π mm = π max /π min = 20) provides better results than the design with more variable weights (π mm = 200); (ii) including the design variable (i.e., the inclusion probabilities π i ) and the constant 1 in the calibration equations gives significantly more accurate estimation; (iii) the GPEL based calibration estimators are at least as efficient as the generalized regression estimators; and (iv) the two choices of q-weights lead to similar results. A possible explanation for (iv) is that the mean of y i given (x i, z i ) under the simulation model is only moderately nonlinear, depending mainly on x i instead of z i or π i. Using q i = π 1 i 1 may lead to more noticeable gains of efficiency when It 15

16 the linear relationship is more seriously misspecified and the nonlinearity occurs in the region where π i is close to 1, as discussed in Section 2.1. We considered five methods for constructing confidence intervals on the population total t y : (1) the normal theory interval based on the Horvitz-Thompson estimator, denoted as HT(NT); (2) the normal theory interval based on the generalized regression estimator, denoted as GREG(NT); (3) the profile GPEL ratio interval based on the scaled χ 2 distribution described in Theorem 1, denoted as GPEL(Ĉ), where Ĉ is the estimated scaling constant; (4) the profile GPEL ratio interval with C = 1 when q i = π 1 i 1, denoted as GPEL(C = 1); and (5) the bootstrap calibrated GPEL ratio interval described in Section 3, denoted as GPEL(Boot), where B = 1000 bootstrap samples were used for each simulation run. Let (ˆθ 1 (k), ˆθ 2 (k)) be a confidence interval on θ = t y obtained from the kth simulated sample using a particular method. Performances of the interval are measured by the (relative) average length (AL), lower (L) and upper (U) tail error rates, and coverage probability (CP), computed respectively as AL = 1 K L = U = CP = { 1 K { 1 K { 1 K K {ˆθ2 (k) ˆθ 1 (k)} /t y, k=1 K I ( t y ˆθ 1 (k) )} 100, k=1 K I ( t y ˆθ 2 (k) )} 100, k=1 K I (ˆθ1 (k) t y ˆθ 2 (k) )} 100. k=1 Note that L + CP + U = 100. Table 2 reports results on confidence intervals where only x i is used in calibration for GREG and GPEL. Table 3 summarizes results with (x i, π i, 1) used for calibration. Major observations from Tables 2 and 3 can be summarized as follows: the GREG(NT) and GPEL(Ĉ) intervals are associated with both greater average lengths and lower coverage probabilities in the two tables, under the design with more variable weights (π mm = 200) than under the design with less variable weights (i) 16

17 (π mm = 20). This demonstrates the challenges for dealing with highly variable sampling weights. (ii) While the average lengths of GREG(NT) and GPEL(Ĉ) intervals are similar, the coverage probabilities of GPEL(Ĉ) are consistently higher and closer to 95% than those of GREG(NT) in the two tables. (iii) including (π i, 1) in the calibration (i.e., Tables 2 versus 3) significantly reduces average lengths for GREG(NT) and GPEL(Ĉ) intervals under the two designs of π mm, echoing the previous results on RRMSE. But the coverage probabilities for both methods decrease noticeably under the design with π mm = 200, although not so under the design with π mm = 20. iv) the GPEL(C = 1) intervals with q i = π 1 i 1 seem to perform well even if π i is not included in calibration (Table 2). The method becomes almost identical to GPEL(Ĉ) when (π i, 1) is included in calibration (Table 3); (v) the bootstrap intervals GPEL(Boot) perform remarkably well, in terms of coverage probabilities, for all the cases, especially under the design with π mm = 200. The average lengths are slightly inflated as compared to GREG(NT) or GPEL(Ĉ). Study II. In this simulation study we used a real survey data set from the 2000 Statistics Canada Family Expenditure Survey for the province of Ontario. The data set contains N = 2248 observations, with measurements on x i : number of people in the household; z i : annual income; y i : total expenditure. Chen et al. (2002) contains a detailed description of the data set. We treated the data as the finite population, and conducted the same types of simulation as in Study I. Once again, the Rao-Sampford sampling method was used and the sample size was set at n = 80. Both π mm = 200 and π mm = 20 are considered. Results are summarized in Tables 4 and 5. The first column in Table 5 indicates the calibration variables (CV) used in related method (x i only versus (x i, π i, 1)). Most of the observations from Study I remain true, except that the GREG(HT) and the GPEL(Ĉ) intervals have much better performances. Low coverage probabilities do not seem to be an issue with the current study. 17

18 5. CONCLUSION Calibration estimation using auxiliary information has been extensively studied in the survey literature. Choices among alternative approaches depend on (i) the flexibility in obtaining efficient point estimators; (ii) the efficiency and reliability of computational procedures; and (iii) the capacity for drawing inferences beyond point estimation such as constructing confidence intervals or conducting hypothesis tests. The generalized pseudo empirical likelihood approach has shown advantages in all three aforementioned aspects. In practice, we recommend including π i as a calibration variable, and using the choice q i = π 1 i 1 especially when y i and x i are suspected to have a strong nonlinear relationship. Confidence intervals can be obtained by using the adjusted GPEL ratio statistic or, when sampling fractions are negligible, by the bootstrap procedure. While inferences on population totals are the main focus in the current paper, extensions to parameters defined through general estimating equations, including regression and logistic regression coefficients, are the natural topic for further development. Moreover, extensions to multistage sampling designs and extensions to analyzing imputed survey data are currently under investigation. 6. APPENDIX Proof of Theorem 1. Note that ŵ i are computed by (4) and (5) and ŵ i /d i = 1/ ( 1 + q i x T i ˆλ ). For u i = o(1), we have log ( ) 1 + u i = ui u2 i 2 + O(u3 i ) and (1 + u i ) 1 = 1 u i + u 2 i + O ( ) u 3 i. Under conditions C1-C6, we have ˆλ = O p (n 1/2 ) and max i S q i x i = o p (n 1/2 ). It follows that u i = q i x T i ˆλ = o p (1) uniformly over all i S. This together with (7) 18

19 leads to EL(ŵ, d) = i S ( q 1 i d i {log 1 + q i x T ˆλ ) ( i q i x T ˆλ ) } 1 i 1 ) ) = 1 2 ˆλ T ( i S d i q i x i x T i ˆλ + o p ( N n ( ) = 1 ) 1 T ) ( ) N (ˆt x t x d i q i x i x T i (ˆt x t x + o p 2 n i S ( = 1 ) N ) 1 T ) ( ) N (ˆt x t x q i x i x T i (ˆt x t x + o p. 2 n i=1 To derive an asymptotic expansion for EL( w(θ), d) with the additional constraint (10), we follow the same technique used in the proof of Theorem 2 of Wu and Rao (2006). For θ = t y, it can be shown that ( EL( w(θ), d) = 1 ) N ) 1 T ) ( ) N (ˆt z t z q i z i z T i (ˆt z t z + o p, 2 n i=1 where z i = (x T i, y i ) T. The last expression remains valid if z i is defined by the linear transformation z i = (x T i, e i ) T, where e i = y i B T x i. Then N i=1 q iz i z T i is block-diagonal, and r(θ) = EL(ŵ, d) EL( w(θ), d) = 1 N ( ) N 2 (ˆη η)2 / q i e 2 i + o p, n where η = N i=1 e i = t y B T t x, ˆη = i S d ie i and ( N ) 1 N B = q i x i x T i q i x i y i. i=1 Therefore, 2r(θ)/C converges in distribution to a χ 2 random variable with one degree of freedom when θ = t y. Note that V p (ˆη) = O(N 2 /n) and C = O(N/n). Proof of Corollary 1. The result follows directly from the fact that V p ( i S e i/π i ) = N i=1 (π 1 i 1)e 2 i + o(n) by Tan (2013, Lemma 1) under rejective sampling. Justification of the Bootstrap Method. The first key result is from Theorem 1, which states that 2r(θ)/C χ 2 1 in distribution when θ = t y, where the scaling i=1 i=1 19

20 constant is given by C = V p (ˆη)/ ( N i=1 q ) ie 2 i with ˆη = i S d ie i and e i = y i B T x i. Note that V p ( ) denotes the design-based variance. The second key result is the parallel development on the bootstrap version of the GPEL ratio statistic, following the exact steps used in the proof of Theorem 1, which shows that 2r (θ)/c χ 2 1 in distribution when θ = ˆt EL. The scaling constant is given by where ˆη = i S d i e i, e i C = V (ˆη S ) / ( ) q i d i e 2 i, i S = y i ˆB T x i, and V ( S) denotes the variance under the bootstrap sampling procedure, conditional on the original survey sample. Let z i = π i /n and z i = π i /n. It follows that d i = 1/π i = (1/z i )/n and ˆη = n 1 i S r i where r i = e i /z i. Similarly, we also have d i = (1/z i )/n and ˆη = n 1 i S r i where r i = e i /z i. Under the proposed with-replacement bootstrap procedure, we have V (ˆη S ) = S 2 r /n, where S 2 r = n 1 i S( ri ˆη ) 2. If the original survey sample is selected by a single-stage unequal probability sampling design with replacement, then ˆη = n 1 i S e i/z i is the standard Hansen- Hurwitz estimator. The design-based variance V p (ˆη) can be unbiasedly estimated by n 1{ (n 1) 1 i S (r i ˆη) 2}. In this case, we have C/C 1 as n gets large, the bootstrap version 2r (θ) and the original version 2r(θ) follow asymptotically the same scaled χ 2 distribution. The bootstrap percentile b α is a consistent estimator of the true percentile b α. ACKNOWLEDGEMENTS The research of Z. Tan was supported by a grant from the Natural Science Foundation of United States. The research of C. Wu was supported by a grant from the Natural Sciences and Engineering Research Council of Canada. REFERENCES Y.G. Berger (1998). Rate of convergence to normal distribution for the Horvitz- 20

21 Thompson estimator. Journal of Statistical Planning and Inference, 67, Y.G. Berger, E.H.M. Tirari & Y. Tillé (2003). Towards optimal regression estimation in sample surveys. Australian & New Zealand Journal of Statistics, 45, K.R.W. Brewer (1999). Cosmetic calibration with unequal probability sampling. Survey Methodology, 25, S. Chen & J.K. Kim (2014). Population empirical likelihood for nonparametric inference in survey sampling. Statistica Sinica, 24, J. Chen & R.R. Sitter (1999). A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys. Statistica Sinica, 12, J. Chen, R.R. Sitter & C. Wu (2002). Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys. Biometrika, 89, J.C. Deville & C.E. Särndal (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, T.J. DiCiccio & J.P. Romano (1990). Nonparametric confidence limits by resampling methods and least favorable families. International Statistical Review, 58, R.E. Folsom (1991). Exponential and logistic weight adjustment for sampling and nonresponse error reduction. In Proceedings of the Section on Social Statistics, American Statistical Association, W.A. Fuller (2002). Regression estimation for survey samples. Survey Methodology, 28,

22 W.A. Fuller (2009). Sampling Statistics. Hoboken, New Jersey: John Wiley & Sons, Inc. W.A. Fuller & C.T. Isaki (1981). Survey design under superpopulation models. In Current Topics in Survey Sampling, eds. D. Krewski, J.N.K. Rao, and R. Platek, New York: Academic Press, J. Hajek (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Annals of Mathematical Statistics, 35, J.K. Kim (2010). Calibration estimation using exponential tilting in sample surveys. Survey Methodology, 36, S. Kullback (1959). Information Theory and Statistics. Wiley, New York. S. Kullback & R.A. Leibler (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, G.E. Montanari (1987). Post-sampling efficient QR-prediction in large-scale surveys. International Statistical Review, 55, A.B. Owen (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75, A.B. Owen (2001). Empirical Likelihood. Chapman & Hall/CRC. S. Park & J.K. Kim (2014). Instrumental-variable calibration estimation in survey sampling. Statistica Sinica, 24, J. Qin & J. Lawless (1994). Empirical likelihood and general estimating equations,. Annals of Statistics, 22, J.N.K. Rao (1994). Estimating totals and distribution functions using auxiliary information at the estimation stage. Journal of Official Statistics, 10, J.N.K. Rao (1965). On two simple schemes of unequal probability sampling without replacement. Journal of the Indian Statistical Association, 3,

23 M.R. Sampford (1967). On sampling without replacement with unequal probabilities of selection. Biometrika, 54, C.E. Särndal (2007). The calibration approach in survey theory and practice. Survey Methodology, 33, C.E. Särndal & R.L. Wright (1984). Cosmetic form of estimators in survey sampling. Scandinavian Journal of Statistics, 11, C.E. Särndal, B. Swensson & J.H. Wretman (1992). Model-assisted Survey Sampling. New York: Springer-Verlag. Z. Tan (2006). A distributional approach for causal inference using propensity scores. Journal of the American Statistical Association, 101, Z. Tan (2010). Bounded, efficient and doubly robust estimation with inverse weighting. Biometrika, 97, Z. Tan (2013). Simple design-efficient calibration estimators for rejective and highentropy sampling. Biometrika, 100, C. Wu (2004). Weighted empirical likelihood inference. Statistics & Probability Letters, 66, C. Wu & J.N.K. Rao (2006). Pseudo-empirical likelihood ratio confidence intervals for complex surveys. The Canadian Journal of Statistics, 34, C. Wu & J.N.K. Rao (2010). Bootstrap procedures for the pseudo empirical likelihood method in sample surveys. Statistics and Probability Letters, 80,

24 Table 1: Relative Root Mean Square Error ( 10 3 ) of Point Estimators (Study I) q i ρ π mm HT GREG-1 GREG-2 GPEL-1 GPEL π 1 i

25 Table 2: Coverage Probabilities and Average Length of 95% CI (Study I; Calibrated over x i ) q i ρ π mm HT(NT) GREG(NT) GPEL(Ĉ) GPEL(C = 1) GPEL(Boot) AL U CP L AL U CP L AL U CP L AL U CP L π 1 i AL U CP L AL U CP L AL U CP L AL U CP L

26 Table 3: Coverage Probabilities and Average Length of 95% CI (Study I; Calibrated over (x i, π i, 1)) q i ρ π mm HT(NT) GREG(NT) GPEL(Ĉ) GPEL(C = 1) GPEL(Boot) AL U CP L AL U CP L AL U CP L AL U CP L π 1 i AL U CP L AL U CP L AL U CP L AL U CP L

27 Table 4: Relative Root Mean Square Error ( 10 3 ) of Point Estimators (Study II) q i π mm HT GREG-1 GREG-2 GPEL-1 GPEL π 1 i

28 Table 5: Coverage Probabilities and Average Length of 95% CI (Study II) CV q i π mm HT(NT) GREG(NT) GPEL(Ĉ) GPEL(C = 1) GPEL(Boot) x i AL U CP L AL U CP L π 1 i AL U CP L AL U CP L (x i, π i, 1) AL U CP L AL U CP L π 1 i AL U CP L AL U CP L

Generalized pseudo empirical likelihood inferences for complex surveys

The Canadian Journal of Statistics Vol. 43, No. 1, 2015, Pages 1 17 La revue canadienne de statistique 1 Generalized pseudo empirical likelihood inferences for complex surveys Zhiqiang TAN 1 * and Changbao