Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys

Size: px
Start display at page:

Download "Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys"

Transcription

1 The Canadian Journal of Statistics Vol.??, No.?,????, Pages???-??? La revue canadienne de statistique Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys Zhiqiang TAN 1 and Changbao WU 2 1 Department of Statistics, Rutgers University, Piscataway, NJ 08854, USA 2 Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON N2L 3G1, CANADA Key words and phrases: Auxiliary information; calibration techniques; confidence intervals; Kullback-Leibler distance; survey design. MSC 2000 : Primary 62D05; secondary 62G09. Abstract: We consider generalized pseudo empirical likelihood inferences for complex surveys. The method is based on a weighted version of the Kullback-Leibler (KL) distance for calibration estimation (Deville and Särndal, 1992) and includes the pseudo empirical likelihood estimator (Chen and Sitter, 1999; Wu and Rao, 2006) and the calibrated likelihood estimator (Tan, 2013) as special cases. We show that a suitably formulated empirical likelihood ratio-type statistic follows asymptotically a scaled chisquare distribution, which extends the main result in Wu and Rao (2006) and makes the likelihood ratio-type confidence intervals available for calibration estimation using arbitrary choices of the weighting factor in the weighted KL distance. We further show that the scaling factor for the scaled chisquare distribution can be circumvented either through a particular choice of the weighting factor for the KL distance or using a bootstrap method. The proposed bootstrap procedure is justified for single-stage sampling designs with negligible sampling fractions. Finite sample performances of confidence intervals constructed using our proposed methods are investigated and compared with existing ones through two simulation studies. 1

2 1. INTRODUCTION Calibration is a popular inference tool for analysis of complex surveys. It originates from the idea of benchmarking when population totals of certain auxiliary variables are known and used to form benchmark constraints. The method has gained significant popularity since the work of Deville and Särndal (1992). Calibration estimators are closely related to the generalized regression estimators. Some of the important developments for regression estimators, such as variance estimation techniques, can also be used for calibration estimators. Fuller (2002) provides an excellent review on regression estimation and Särndal (2007) contains a thorough review on calibration techniques. The conventional route for inferences with calibration methods is to first compute the point estimator with an estimated variance, and then use the standard Z-statistic based on normal approximations to construct confidence intervals or conduct statistical tests. Confidence intervals under this approach are forced to be symmetric around the point estimator and are not necessarily confined within the parameter space. There have been significant developments on empirical likelihood (EL) methods in non-survey statistics (Owen, 2001). Progress has also been made on using the empirical likelihood method for complex surveys. See, for instance, Chen and Sitter (1999), Wu and Rao (2006), Chen and Kim (2014), among others. The most attractive feature of the empirical likelihood approach is the data-driven, range-respecting confidence intervals based on the empirical likelihood ratio statistic. This property is not enjoyed by the conventional calibration method with Wald-type confidence intervals. Suppose that U = {1, 2,, N} is the set of N units for the finite population, with (y i, x i ) being the values of the study variable y and the vector of auxiliary variables x attached to unit i. Let t y = N i=1 y i be the parameter of interest, and t x = N i=1 x i be the known population totals. Let µ y = N 1 t y and µ x = N 1 t x be the corresponding population means. Let S be a set of n sampled units and 2

3 {(y i, x i ) : i S} be the survey data. Let π i = P (i S) be the first order inclusion probabilities and d i = π 1 i be the basic design weights. The calibration estimator of t y is computed as ˆt CAL = i S ŵiy i, where the ŵ i are the calibrated weights obtained by minimizing a distance measure G(d, w) = i S G i (d i, w i ) between the w i and the d i subject to the set of calibration equations (also called benchmark constraints) w i x i = t x. (1) i S The most commonly used distance measure is the chi-squared distance specified by G i (d i, w i ) = (w i d i ) 2 /(q i d i ), where the q i are the pre-specified constants. It is well known that, under the chisquared distance, the resulting calibration estimator ˆt CAL is algebraically identical to the generalized regression estimator (Särndal et al., 1992). Under the conventional calibration approach (Deville and Särndal, 1992), confidence intervals on t y reply on asymptotic normality of ˆt CAL and are constructed through the standardized Z- statistics (ˆt ) { CAL t y / v(ˆt CAL ) } 1/2, where v(ˆt CAL ) is a consistent estimator of the variance of ˆt CAL. The role of q i used in the distance measure does not affect the consistency of the calibration estimator but it has an impact on the variance of the estimator. The choice q i = π 1 i 1 can lead to more efficient calibration estimators (Tan 2010, 2013). A closely related research topic is the design-optimal regression estimator; see, for instance, Fuller and Isaki (1981), Montanari (1987), Rao (1994), Berger et al. (2003), Chen and Kim (2014), among others. Fuller (2009) considered modeloptimal design-consistent estimators. For complex survey data, Chen and Sitter (1999) proposed to use a pseudo empirical (log) likelihood l(p) = i S d i log(p i ), (2) 3

4 where p = (p 1,, p n ) T is the discrete probability measure over the n sampled units. The maximum pseudo-el estimator of the population mean µ y is computed as ˆµ PEL = i S ˆp iy i, where the ˆp i maximize the pseudo-el function l(p) subject to i S p i = 1 and the set of constraints p i x i = µ x. (3) i S Chen and Sitter (1999) showed that the estimator ˆµ PEL is asymptotically equivalent to the calibration estimator N 1ˆt CAL with the choice of q i = 1. Wu and Rao (2006) showed that the pseudo-el ratio statistic on µ y, adjusted by a scaling factor involving the design effect, has an asymptotic χ 2 distribution with one degree of freedom. Consequently, confidence intervals on µ y based on the pseudo-el ratio statistic can be constructed. There are two major gaps between the conventional calibration method and the pseudo empirical likelihood method. First, the pseudo-el method is designed for the population mean µ y and relies on the constraints (3) which uses the known population means µ x. The method cannot be directly applied to scenarios where the population size N is unknown and only the population totals t x are available. Second, the pseudo-el approach cannot entertain a general choice of the weight factor, q i, which is an important tool for achieving design-optimal estimation as mentioned earlier. Recently, Tan (2010, 2013) developed a calibrated likelihood method by exploiting a connection between survey calibration and missing data problems. The method can be understood in two steps. Let R i be the sample inclusion indicator, i.e., R i = 1 if i S and R i = 0 otherwise. The first step is to treat {(x i, y i, R i ) : i U} as an iid sample from a joint distribution of (x, y, R) and derive a bona fide empirical likelihood estimator (Qin and Lawless, 1994) for E(y) = E{π 1 (x)ry}, subject to the moment constraints 0 = E{π 1 (x)rx x}, based on the observed data {(x i, R i y i, R i ) : i U}, where π(x) = P (R = 1 x, y) is assumed to be free of y. This empirical likelihood estimator of E(y) takes the usual form N 1 i S ŵiy i for some weights ŵ i, but the calibration equations are generally violated, i.e., i S ŵix i t x. 4

5 For the second step, Tan (2010) proposed a modification such that the calibration equations are satisfied but without affecting the first-order asymptotic variance. As observed in Tan (2013), the calibrated likelihood estimator turns out to be algebraically equivalent to ˆt CAL distance and the weight factor q i set to π 1 i 1. with D(w, d) specified as a weighted Kullback-Leibler The calibrated likelihood estimator (Tan 2010) and, similarly, the calibrated regression estimator (Tan 2006) are shown in Tan (2013) to be asymptotically optimal under rejective or high-entropy sampling designs when π i is included as a calibration variable in x i. These two estimators are simpler than the usual optimal regression estimator involving second-order inclusion probabilities (Fuller and Isaki 1981; Montanari 1987; Rao 1994). See also Chen and Kim (2014) for related results but under negligible sampling fractions. In this paper, we consider generalized pseudo empirical likelihood inferences for complex surveys. The method is based on a weighted version of the Kullback-Leibler (KL) distance for calibration estimation (Deville and Särndal, 1992) and includes the pseudo empirical likelihood estimator (Chen and Sitter, 1999; Wu and Rao, 2006) and the calibrated likelihood estimator (Tan, 2013) as special cases. We show that a suitably formulated empirical likelihood ratio-type statistic follows asymptotically a scaled chisquare distribution, which extends the main result in Wu and Rao (2006) and makes the likelihood ratio-type confidence intervals available for calibration estimation using arbitrary choices of the weighting factor in the weighted KL distance. We further show that the scaling factor for the scaled chisquare distribution can be circumvented either through a particular choice of the weighting factor for the KL distance or using a bootstrap method. The proposed bootstrap procedure is justified for single-stage sampling designs with negligible sampling fractions. The rest of the paper is organized as follows. Main results on the generalized pseudo empirical likelihood method are presented in Section 2. The proposed bootstrap procedure is described in Section 3. In Section 4, we report results from two simulation studies, one based on a synthetic finite population and the other using 5

6 a Statistics Canada survey data set. Some concluding remarks and discussions are given in Section 5. Proofs of the major results and justification of the bootstrap method are given in the Appendix. 2. GENERALIZED PSEUDO EMPIRICAL LIKELIHOOD METHOD 2.1. Weighted Kullback-Leibler Distance Based Calibration Estimators Kullback-Leibler distance is a measure of divergence between two distributions. It was first described by Kullback and Leibler (1951) as a loss function in the context of information theory, and then further discussed by Kullback (1959). For two discrete probability measures f = (f 1,, f n ) T and g = (g 1,, g n ) T, there are two types of Kullback-Leibler distance: KL(f, g) = n i=1 f i log(f i /g i ) and KL(g, f) = n i=1 g i log(g i /f i ). When discussing confidence intervals for iid data, taking f to be the empirical measure with f i = 1/n and g i to be another probability measure for the data, DiCiccio and Romano (1990) called KL(f, g) the forward Kullback-Leibler distance and KL(g, f) the backward Kullback-Leibler distance. Unfortunately, neither KL(f, g) nor KL(g, f) can be used directly as a distance measure for calibration estimation, since G i (f i, g i ) = f i log(f i /g i ) does not guarantee that G i (f i, g i ) 0 for all i. A simple modification is to use G i (f i, g i ) = f i log(f i /g i ) f i + g i. In this case G i (f i, g i ) = f i {log(g i /f i ) g i /f i + 1} 0 for all i, since log(x) x for any x > 0. For the two sets of weights d = (d 1,, d n ) T and w = (w 1,, w n ) T, we consider the modified forward Kullback-Leibler distance between w and d, weighted by (q 1,, q n ): EL(d, w) = i S { ( q 1 wi ) } i d i log w i + d i. d i This is also called the minimum entropy distance by Deville and Särndal (1992). The notation EL(d, w) indicates its connection to empirical likelihood. For independent but not identically distributed data where d i = n 1 and (w 1,, w n ) are replaced by the probability measure (p 1,, p n ) over the sample, EL(d, w) was discussed in Wu (2004) as the weighted empirical log-likelihood function, with the q-weights specified 6

7 through the variance function. If we let q i = 1 and impose the constraint n i=1 w i = N, then EL(d, w) = n i=1 d i log(p i )+C, where p i = w i /N and C is a constant not involving p i. In this case, minimizing EL(d, w) subject to a set of constraints on w i is equivalent to maximizing the pseudo-el function l(p) = n i=1 d i log(p i ) subject to the same set of constraints on p i. In other words, the pseudo-el approach of Chen and Sitter (1999) and Wu and Rao (2006) is a special case of inferences based on the modified forward Kullback-Leibler distance EL(d, w). We use the term generalized pseudo empirical likelihood (GPEL) to denote calibration estimation under the distance EL(d, w). The GPEL estimator of t y is given by ˆt EL = i S ŵiy i, where the weights ŵ i minimize EL(d, w) subject to (1). If N is known, then the constraint i S w i = N should be included. This amounts to including 1 as the first component of x i and N as the first component of t x in the calibration equations (1). It can be shown by the standard Lagrange multiplier method that where ˆλ is a solution to i S ŵ i = d i, (4) 1 + q i x T i ˆλ d i x i 1 + q i x T i ˆλ t x = 0. (5) It should be noted that the modified backward Kullback-Leibler distance between w and d, weighted by the pre-specified q-weights (q 1,, q n ), is given by: ET(d, w) = { ( q 1 wi ) } i w i log w i + d i. d i i S The notation ET comes from the term exponential tilting, since minimizing ET(d, w) with respect to w subject to constraints (1) results in calibration weights given by w i = d i g i, where g i = exp ( λ x i q i ) and λ is determined by constraints (1). The distance measure ET(d, w) was first mentioned by Deville and Särndal (1992). Folsom (1991) provides an early example on exponential weight adjustment. Kim (2010) contains further discussions on calibration estimation using exponential titling. Now consider the choice q i = π 1 i 1 (Tan 2010, 2013). The distance EL(d, w) is equal to i S (1 π i) 1 {log(w i ) π i w i } up to an additive constant. The resulting 7

8 calibration weights are given by ŵ i = {π i + (1 π i )x T i ˆλ} 1, where ˆλ is the solution to i S x i/{π i + (1 π i )x T i ˆλ} t x = 0. The resulting calibration estimator of µ y is given by ˆµ EL = 1 N i S y i, (6) π i + (1 π i )x T i ˆλ which is exactly the same as the calibrated likelihood estimator of Tan (2010, 2013). The use of π 1 i 1 as a weight also appeared previously in Brewer (1999) on cosmetic calibration and Berger et al. (2003) on optimal regression estimation. See further discussions after Corollary 1. An interesting interpretation of the choice q i = π 1 i 1 is as follows. Let I i = 1 if i S and I i = 0 if i / S, then E p (I i /π i ) = 1 and V p (I i /π i ) = π 1 i 1. Throughout, E p ( ) and V p ( ) refer to expectation and variance under the probability sampling design. In other words, the choice q i = π 1 i 1 reflects the variation of selecting the ith unit into the sample under the survey design. Another benefit of setting q i = π 1 i 1 can be seen from the property that q i 0 if π i 1. If the inclusion probability of a unit is close to 1, then this unit is substantially down-weighted (or completely removed if π i = 1) in the calibration process. This seems to be sensible from a design perspective, because the uncertainty associated with unit i is very small if π i 1, and in this case we should force w i d i 1. In particular, this property may lead to substantial variance reduction, when the linear relationship of y i given x i is violated mostly in the region where π i 1, as seen in Tan (2013, Section 5) Generalized Pseudo Empirical Likelihood Ratio Confidence Intervals The point estimator ˆt EL falls in the general class of calibration estimators (Deville and Särndal, 1992), with the distance measure G(d, w) specified as the modified Kullback-Liebler distance EL(d, w). We now establish an important new result for constructing confidence intervals based on a GPEL ratio statistic similar to the pseudo empirical likelihood ratio statistic in Wu and Rao (2006). We assume that the finite population and the survey design satisfy the same regularity conditions 8

9 C1-C5 described in Wu and Rao (2006). In addition, we assume that C6. The q-weights satisfy N 1 N i=1 q2 i = O(1). Under conditions C1-C6, we have that N 1 N i=1 q ix i x T i = O(1) and N 1 N i=1 q ix i y i = O(1). Let ŵ = (ŵ 1,, ŵ n ), where ŵ i are computed by (4) and (5). By standard asymptotic theory of calibration estimation (Deville and Särndal, 1992), we have ˆλ = O p (n 1/2 ) where ( ) 1 ) ( ˆλ = d i q i x i x T i (ˆt x t x + o ) p n 1/2, (7) i S and ˆt x = i S d ix i. This leads to the following asymptotic expansion: ˆt EL = i S ŵiy i = ˆt GREG + o p (Nn 1/2 ), where ˆt GREG = ˆt y + ˆB T( t x ˆt x ), (8) with ˆt y = i S d iy i and ( ) 1 ˆB = d i q i x i x T i d i q i x i y i. (9) i S i S The estimator (8) is known as the generalized regression estimator for a general choice q i (Särndal et al., 1992). Let w(θ) = ( w 1 (θ),, w n (θ)), where the weights w i (θ) minimize EL(d, w) subject to (1) and w i y i = θ (10) for a given θ. The GPEL ratio statistic for θ is defined as i S r(θ) = EL(d, ŵ) EL(d, w(θ)). We have the following result on the asymptotic distribution of r(θ). Theorem 1. Under regularity conditions C1-C6, the adjusted GPEL ratio statistic 2r(θ)/C converges in distribution to a χ 2 random variable with one degree of 9

10 freedom when θ = t y. The scaling constant C is given by ( N ) C = V p (ˆη)/ q i e 2 i, (11) i=1 where ˆη = i S d ie i, e i = y i B T x i, and B = ( N which can be consistently estimated by ˆB defined in (9). i=1 q ix i x T i ) 1 ( N i=1 q ix i y i ), In practice, the scaling factor C needs to be estimated by a consistent estimator Ĉ, which involves variance estimation for ˆη. This can be handled similarly as in Wu & Rao (2006). A bootstrap procedure described in Section 3 can also be used to circumvent the estimation of C for single-stage sampling designs with negligible sampling fractions. For rejective or high-entropy sampling designs, estimation of C is also not required, as shown in Corollaries 1 and 2 below. An interesting special case of Theorem 1 is obtained for the calibrated likelihood estimator (6) of Tan (2013) under rejective sampling, using the weight factor q i = π 1 i 1 and including π i as a calibration variable in x i. As defined in Hajek (1964), rejective sampling is Poisson sampling conditional on a fixed sample size. For example, simple random sampling without replacement corresponds to rejective sampling with constant inclusion probabilities. In this case the scaling constant C is asymptotically equal to 1. We assume that lim inf N N 1 N i=1 (π 1 i 1)e 2 i > 0. Corollary 1. Let q i = π 1 i 1 and assume that π i is included as a component of x i. Under rejective sampling and the regularity conditions stated in Theorem 1 of Tan (2013), we have lim N C = 1 and 2r(θ) converges in distribution to a χ 2 random variable with one degree of freedom when θ = t y. A heuristic explanation for the simplification of C is as follows; see also Tan (2013). Under Poisson sampling, V p (ˆη) = N i=1 (π 1 i 1)e 2 i and hence C is exactly equal to 1. But rejective sampling of size n is defined as Poisson sampling conditional on a fixed sample size n, that is, i S d iπ i = n. Hence under rejective sampling, V p (ˆη) is asymptotically equal to the residual variance N i=1 (π 1 i 1)(e i bπ i ) 2, { N } 1 { with b = i=1 π N i(1 π i ) i=1 (1 π i)e i }, which then reduces to 0 by the 10

11 y i + i S definition of e i and the fact that π i is included as a component of x i. Such an argument is also implicit in Berger et al. (2003) on optimal regression estimation. In fact, under single-stage rejective sampling, the estimator of Berger et al. reduces to the same estimator, up to some minor difference, as the calibrated regression estimator of Tan (2013), taking the form of ˆt GREG in (8) with q i = π 1 i 1. Incidentally, when both using q i = π 1 i 1 and including π i in x i, the calibrated regression estimator can also be expressed in the cosmetic form of linear prediction estimators (Särndal & Wright, 1984), i.e., ˆt GREG = i S ˆB T x i, as shown in Tan (2013). Fuller (2009) and Park and Kim (2014) also contain discussions on the topic. These choices of q i and x i satisfy a general construction of cosmetic calibration estimators in Brewer (1999). In Brewer s notation, Z s is taken here to be diagonal with diagonal elements π i. The condition Z s 1 n = X s α holds because π i s constitute a column of X s. But, in general, Brewer s (1999) proposal does not imply setting q i = π 1 i 1. Similarly as in Tan (2013), Corollary 1 can be generalized from rejective sampling to other high-entropy sampling methods such that the Kullback-Leibler divergence from rejective sampling tends to 0. In particular, Rao-Sampford sampling method (Rao, 1965; Sampford, 1967) is an example of high-entropy sampling provided that n i=1 π i(1 π i ) (Berger, 1998), which is already implied by the regularity conditions in Tan (2013, Theorem 1). Corollary 2. The same result as in Corollary 1 holds if rejective sampling procedure is replaced by the Rao-Sampford sampling procedure Computational Procedures for GPEL The basic computational problem is to minimize EL(d, w) with respect to w = (w 1,, w n ) subject to constraint (1). The resulting ŵ i is given by (4) with the Lagrange multiplier ˆλ being the solution to (5). The key to our computational algorithms is that the required constrained minimization with respect to w is a dual 11

12 problem of maximizing K(λ) = i S q 1 i d i log ( 1 + q i x T i λ ) t T xλ with respect to λ within the set Ω(λ) = {λ : 1 + q i x T i λ > 0, i S}, since (5) is equivalent to 1 ( λ ) = λ K(λ) = i S Note that K(λ) is a concave function of λ, since the matrix 2 ( λ ) = 2 λ λ T K(λ) = i S d i x i 1 + q i x T i ˆλ t x = 0. (12) d i q i x i x T i (1 + q i x T i λ)2 (13) is negative definite. This duality property was also observed in Tan (2010) for the calibrated likelihood estimator, where, up to an additive constant, K(λ) = i S log{π i + (1 π i )x T i λ} 1 π i t T xλ. The solution to (5) can be found using the modified Newton-Raphson procedures of Chen et al. (2002) with 1 (λ) and 2 (λ) defined in (12) and (13). 3. A BOOTSTRAP PROCEDURE Results from Theorem 1 can be used to construct 1 α level confidence intervals for the population total θ = t y in the form of {θ 2r(θ)/C < χ 2 1(α)}, where χ 2 1(α) is the upper 100αth percentile from the χ 2 1 distribution. Under an arbitrary unequal probability sampling design, the scaling constant C needs to be estimated, which involves variance estimation for ˆη. For single-stage unequal probability sampling designs with negligible sampling fractions, the scaling constant can be circumvented through a bootstrap calibration method. The bootstrap procedure also provides a useful alternative to the chisquare approximation for rejective or high-entropy sampling under Corollaries 1 and 2 where C can be replaced by Ĉ = 1. The bootstrap procedure introduced here is similar to the with-replacement bootstrap procedure described in Wu and Rao (2010) for the pseudo empirical likelihood method. The bootstrap calibrated 1 α level confidence intervals on θ = t y using the 12

13 unscaled GPEL ratio statistic is constructed as {θ 2r(θ) < b α }, where b α is the upper 100αth percentile from the sampling distribution of 2r(θ). The bootstrap procedure provides a Monte Carlo approximation to b α. The most crucial part of the bootstrap method is to treat the survey weights d i and the q-weights q i as part of the sample data. Let {(d i, q i, x i, y i ), i S} be the original survey data set. Let t x be the known population totals for the x-variables and let ˆt EL = i S ŵiy i be the calibration estimator of t y using the distance measure EL(d, w). Our proposed bootstrap method consists of the following four steps: [1] Select a bootstrap sample S of size n from the original sample S using simple random sampling with replacement; denote the bootstrap sample data by {(d i, q i, x i, y i ), i S }. [2] Let the bootstrap version of EL(d, w) be defined as EL (d, w) = i S (q i ) 1 {d i log ( wi d i ) } w i + d i. [3] Calculate the GPEL ratio statistic r (θ) = EL (d, ŵ) EL (d, w(θ)) at θ = ˆt EL, where ŵ = (ŵ 1,, ŵ n ) T maximize EL (d, w) subject to i S w i x i = t x and w(θ) = ( w 1 (θ),, w n (θ)) T maximize EL (d, w) subject to i S w i x i = t x and i S w i yi = ˆt EL. [4] Repeat Steps [1], [2] and [3] a large number of times, B, independently, to obtain the sequence 2r 1(θ),, 2r B(θ), all at θ = ˆt EL. Let b α be the upper 100αth sample percentile from this sequence. The proposed bootstrap method can be formally justified for single-stage unequal probability sampling design with replacement; see the Appendix for details. The procedure also provides good approximations for single-stage unequal probability sampling designs without replacement if the sampling fraction is small. Treating survey designs with negligible sampling fractions as if the units are selected with replacement is a common practice in survey sampling for the purpose of variance 13

14 estimation or other second order analysis. The bootstrap calibrated confidence interval on t y, constructed as {θ 2r(θ) < b α}, has approximately correct asymptotic coverage probability at the 1 α level. 4. SIMULATION STUDIES We now report results from two simulation studies on the performances of the GPEL based estimators and GPEL ratio confidence intervals on a population total, with comparisons to the generalized regression estimators and the usual normal theory confidence intervals. Study I. The finite population of size N = 2000 used for the simulation was generated from the model y i = β x i + 2z i + 0.5{x i I(x i < 2)} z 1/2 i + σε i, where x i lognormal(0, 1), z i χ 2 2, ε i N(0, 1), I( ) is the indicator function, β 0 was chosen such that y i 0 for i = 1,, N. Two values of σ were used such that the correlation coefficients, ρ, between the response variable y i and the linear predictor β x i + 2z i + 0.5{x i I(x i < 2)} z 1/2 i are 0.80 and 0.50, respectively. The finite population, once generated from the above model, was held fixed. Under this setting, the finite population correlation coefficients between y and x are respectively 0.46 and 0.30 for ρ = 0.80 and ρ = 0.50; the correlation coefficients between y and z are respectively 0.66 and 0.43 for the two corresponding values of ρ. Single-stage unequal probability samples of size n = 80 were taken from the finite population, with inclusion probabilities π i proportional to z i + c. Two values of c were considered such that π mm = π max /π min equals 200 and 20, respectively, where π max = max{π i, i = 1,, N} and π min = min{π i, i = 1,, N}. Rao-Sampford unequal probability sampling method (Rao, 1965; Sampford, 1967) was used in selecting the samples. Note that the sampling fraction is 80/2000 = 4%, which is small, and the Rao-Sampford method has high entropy (Berger, 1998). It should also 14

15 be noted that the second-order inclusion probabilities π ij can be computed exactly for the Rao-Sampford sampling method. We considered two choices of q-weights: q i = 1 and q i = π 1 i 1. This gives a total of eight different scenarios with respect to the choices on ρ, π mm and q i. For each scenario, five point estimators of the population total t y were computed: (1) the basic Horvitz-Thompson estimator (HT); (2) the generalized regression estimator calibrated over x i (GREG-1); (3) the generalized regression estimator calibrated over (x i, π i, 1) (GREG-2); (4) the GPEL based estimator calibrated over x i (GPEL- 1); and (5) the GPEL based estimator calibrated over (x i, π i, 1) (GPEL-2). Performances of a point estimator ˆt y of the population total t y are evaluated in terms of simulated Relative Bias (RB) and Relative Root Mean Square Error (RRMSE) defined as RB = K 1 K k=1 } {ˆt y (k) t y /t y and RRMSE = ( MSE) 1/2/ty, where ˆt y (k) is the estimator computed from the kth simulated sample, MSE = K 1 K k=1 {ˆt y (k) t y } 2, and K is the total number of simulation runs. All five estimators demonstrated negligible biases ( RB < 3% for all cases). Details are not included here to save space. The simulated values of RRM SE are summarized in Table 1. The results for the Horvitz-Thompson estimator are reported from two independent simulations for the two choices of q-weights. can be seen from the table that (i) the design with less variable weights (π mm = π max /π min = 20) provides better results than the design with more variable weights (π mm = 200); (ii) including the design variable (i.e., the inclusion probabilities π i ) and the constant 1 in the calibration equations gives significantly more accurate estimation; (iii) the GPEL based calibration estimators are at least as efficient as the generalized regression estimators; and (iv) the two choices of q-weights lead to similar results. A possible explanation for (iv) is that the mean of y i given (x i, z i ) under the simulation model is only moderately nonlinear, depending mainly on x i instead of z i or π i. Using q i = π 1 i 1 may lead to more noticeable gains of efficiency when It 15

16 the linear relationship is more seriously misspecified and the nonlinearity occurs in the region where π i is close to 1, as discussed in Section 2.1. We considered five methods for constructing confidence intervals on the population total t y : (1) the normal theory interval based on the Horvitz-Thompson estimator, denoted as HT(NT); (2) the normal theory interval based on the generalized regression estimator, denoted as GREG(NT); (3) the profile GPEL ratio interval based on the scaled χ 2 distribution described in Theorem 1, denoted as GPEL(Ĉ), where Ĉ is the estimated scaling constant; (4) the profile GPEL ratio interval with C = 1 when q i = π 1 i 1, denoted as GPEL(C = 1); and (5) the bootstrap calibrated GPEL ratio interval described in Section 3, denoted as GPEL(Boot), where B = 1000 bootstrap samples were used for each simulation run. Let (ˆθ 1 (k), ˆθ 2 (k)) be a confidence interval on θ = t y obtained from the kth simulated sample using a particular method. Performances of the interval are measured by the (relative) average length (AL), lower (L) and upper (U) tail error rates, and coverage probability (CP), computed respectively as AL = 1 K L = U = CP = { 1 K { 1 K { 1 K K {ˆθ2 (k) ˆθ 1 (k)} /t y, k=1 K I ( t y ˆθ 1 (k) )} 100, k=1 K I ( t y ˆθ 2 (k) )} 100, k=1 K I (ˆθ1 (k) t y ˆθ 2 (k) )} 100. k=1 Note that L + CP + U = 100. Table 2 reports results on confidence intervals where only x i is used in calibration for GREG and GPEL. Table 3 summarizes results with (x i, π i, 1) used for calibration. Major observations from Tables 2 and 3 can be summarized as follows: the GREG(NT) and GPEL(Ĉ) intervals are associated with both greater average lengths and lower coverage probabilities in the two tables, under the design with more variable weights (π mm = 200) than under the design with less variable weights (i) 16

17 (π mm = 20). This demonstrates the challenges for dealing with highly variable sampling weights. (ii) While the average lengths of GREG(NT) and GPEL(Ĉ) intervals are similar, the coverage probabilities of GPEL(Ĉ) are consistently higher and closer to 95% than those of GREG(NT) in the two tables. (iii) including (π i, 1) in the calibration (i.e., Tables 2 versus 3) significantly reduces average lengths for GREG(NT) and GPEL(Ĉ) intervals under the two designs of π mm, echoing the previous results on RRMSE. But the coverage probabilities for both methods decrease noticeably under the design with π mm = 200, although not so under the design with π mm = 20. iv) the GPEL(C = 1) intervals with q i = π 1 i 1 seem to perform well even if π i is not included in calibration (Table 2). The method becomes almost identical to GPEL(Ĉ) when (π i, 1) is included in calibration (Table 3); (v) the bootstrap intervals GPEL(Boot) perform remarkably well, in terms of coverage probabilities, for all the cases, especially under the design with π mm = 200. The average lengths are slightly inflated as compared to GREG(NT) or GPEL(Ĉ). Study II. In this simulation study we used a real survey data set from the 2000 Statistics Canada Family Expenditure Survey for the province of Ontario. The data set contains N = 2248 observations, with measurements on x i : number of people in the household; z i : annual income; y i : total expenditure. Chen et al. (2002) contains a detailed description of the data set. We treated the data as the finite population, and conducted the same types of simulation as in Study I. Once again, the Rao-Sampford sampling method was used and the sample size was set at n = 80. Both π mm = 200 and π mm = 20 are considered. Results are summarized in Tables 4 and 5. The first column in Table 5 indicates the calibration variables (CV) used in related method (x i only versus (x i, π i, 1)). Most of the observations from Study I remain true, except that the GREG(HT) and the GPEL(Ĉ) intervals have much better performances. Low coverage probabilities do not seem to be an issue with the current study. 17

18 5. CONCLUSION Calibration estimation using auxiliary information has been extensively studied in the survey literature. Choices among alternative approaches depend on (i) the flexibility in obtaining efficient point estimators; (ii) the efficiency and reliability of computational procedures; and (iii) the capacity for drawing inferences beyond point estimation such as constructing confidence intervals or conducting hypothesis tests. The generalized pseudo empirical likelihood approach has shown advantages in all three aforementioned aspects. In practice, we recommend including π i as a calibration variable, and using the choice q i = π 1 i 1 especially when y i and x i are suspected to have a strong nonlinear relationship. Confidence intervals can be obtained by using the adjusted GPEL ratio statistic or, when sampling fractions are negligible, by the bootstrap procedure. While inferences on population totals are the main focus in the current paper, extensions to parameters defined through general estimating equations, including regression and logistic regression coefficients, are the natural topic for further development. Moreover, extensions to multistage sampling designs and extensions to analyzing imputed survey data are currently under investigation. 6. APPENDIX Proof of Theorem 1. Note that ŵ i are computed by (4) and (5) and ŵ i /d i = 1/ ( 1 + q i x T i ˆλ ). For u i = o(1), we have log ( ) 1 + u i = ui u2 i 2 + O(u3 i ) and (1 + u i ) 1 = 1 u i + u 2 i + O ( ) u 3 i. Under conditions C1-C6, we have ˆλ = O p (n 1/2 ) and max i S q i x i = o p (n 1/2 ). It follows that u i = q i x T i ˆλ = o p (1) uniformly over all i S. This together with (7) 18

19 leads to EL(ŵ, d) = i S ( q 1 i d i {log 1 + q i x T ˆλ ) ( i q i x T ˆλ ) } 1 i 1 ) ) = 1 2 ˆλ T ( i S d i q i x i x T i ˆλ + o p ( N n ( ) = 1 ) 1 T ) ( ) N (ˆt x t x d i q i x i x T i (ˆt x t x + o p 2 n i S ( = 1 ) N ) 1 T ) ( ) N (ˆt x t x q i x i x T i (ˆt x t x + o p. 2 n i=1 To derive an asymptotic expansion for EL( w(θ), d) with the additional constraint (10), we follow the same technique used in the proof of Theorem 2 of Wu and Rao (2006). For θ = t y, it can be shown that ( EL( w(θ), d) = 1 ) N ) 1 T ) ( ) N (ˆt z t z q i z i z T i (ˆt z t z + o p, 2 n i=1 where z i = (x T i, y i ) T. The last expression remains valid if z i is defined by the linear transformation z i = (x T i, e i ) T, where e i = y i B T x i. Then N i=1 q iz i z T i is block-diagonal, and r(θ) = EL(ŵ, d) EL( w(θ), d) = 1 N ( ) N 2 (ˆη η)2 / q i e 2 i + o p, n where η = N i=1 e i = t y B T t x, ˆη = i S d ie i and ( N ) 1 N B = q i x i x T i q i x i y i. i=1 Therefore, 2r(θ)/C converges in distribution to a χ 2 random variable with one degree of freedom when θ = t y. Note that V p (ˆη) = O(N 2 /n) and C = O(N/n). Proof of Corollary 1. The result follows directly from the fact that V p ( i S e i/π i ) = N i=1 (π 1 i 1)e 2 i + o(n) by Tan (2013, Lemma 1) under rejective sampling. Justification of the Bootstrap Method. The first key result is from Theorem 1, which states that 2r(θ)/C χ 2 1 in distribution when θ = t y, where the scaling i=1 i=1 19

20 constant is given by C = V p (ˆη)/ ( N i=1 q ) ie 2 i with ˆη = i S d ie i and e i = y i B T x i. Note that V p ( ) denotes the design-based variance. The second key result is the parallel development on the bootstrap version of the GPEL ratio statistic, following the exact steps used in the proof of Theorem 1, which shows that 2r (θ)/c χ 2 1 in distribution when θ = ˆt EL. The scaling constant is given by where ˆη = i S d i e i, e i C = V (ˆη S ) / ( ) q i d i e 2 i, i S = y i ˆB T x i, and V ( S) denotes the variance under the bootstrap sampling procedure, conditional on the original survey sample. Let z i = π i /n and z i = π i /n. It follows that d i = 1/π i = (1/z i )/n and ˆη = n 1 i S r i where r i = e i /z i. Similarly, we also have d i = (1/z i )/n and ˆη = n 1 i S r i where r i = e i /z i. Under the proposed with-replacement bootstrap procedure, we have V (ˆη S ) = S 2 r /n, where S 2 r = n 1 i S( ri ˆη ) 2. If the original survey sample is selected by a single-stage unequal probability sampling design with replacement, then ˆη = n 1 i S e i/z i is the standard Hansen- Hurwitz estimator. The design-based variance V p (ˆη) can be unbiasedly estimated by n 1{ (n 1) 1 i S (r i ˆη) 2}. In this case, we have C/C 1 as n gets large, the bootstrap version 2r (θ) and the original version 2r(θ) follow asymptotically the same scaled χ 2 distribution. The bootstrap percentile b α is a consistent estimator of the true percentile b α. ACKNOWLEDGEMENTS The research of Z. Tan was supported by a grant from the Natural Science Foundation of United States. The research of C. Wu was supported by a grant from the Natural Sciences and Engineering Research Council of Canada. REFERENCES Y.G. Berger (1998). Rate of convergence to normal distribution for the Horvitz- 20

21 Thompson estimator. Journal of Statistical Planning and Inference, 67, Y.G. Berger, E.H.M. Tirari & Y. Tillé (2003). Towards optimal regression estimation in sample surveys. Australian & New Zealand Journal of Statistics, 45, K.R.W. Brewer (1999). Cosmetic calibration with unequal probability sampling. Survey Methodology, 25, S. Chen & J.K. Kim (2014). Population empirical likelihood for nonparametric inference in survey sampling. Statistica Sinica, 24, J. Chen & R.R. Sitter (1999). A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys. Statistica Sinica, 12, J. Chen, R.R. Sitter & C. Wu (2002). Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys. Biometrika, 89, J.C. Deville & C.E. Särndal (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, T.J. DiCiccio & J.P. Romano (1990). Nonparametric confidence limits by resampling methods and least favorable families. International Statistical Review, 58, R.E. Folsom (1991). Exponential and logistic weight adjustment for sampling and nonresponse error reduction. In Proceedings of the Section on Social Statistics, American Statistical Association, W.A. Fuller (2002). Regression estimation for survey samples. Survey Methodology, 28,

22 W.A. Fuller (2009). Sampling Statistics. Hoboken, New Jersey: John Wiley & Sons, Inc. W.A. Fuller & C.T. Isaki (1981). Survey design under superpopulation models. In Current Topics in Survey Sampling, eds. D. Krewski, J.N.K. Rao, and R. Platek, New York: Academic Press, J. Hajek (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Annals of Mathematical Statistics, 35, J.K. Kim (2010). Calibration estimation using exponential tilting in sample surveys. Survey Methodology, 36, S. Kullback (1959). Information Theory and Statistics. Wiley, New York. S. Kullback & R.A. Leibler (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, G.E. Montanari (1987). Post-sampling efficient QR-prediction in large-scale surveys. International Statistical Review, 55, A.B. Owen (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75, A.B. Owen (2001). Empirical Likelihood. Chapman & Hall/CRC. S. Park & J.K. Kim (2014). Instrumental-variable calibration estimation in survey sampling. Statistica Sinica, 24, J. Qin & J. Lawless (1994). Empirical likelihood and general estimating equations,. Annals of Statistics, 22, J.N.K. Rao (1994). Estimating totals and distribution functions using auxiliary information at the estimation stage. Journal of Official Statistics, 10, J.N.K. Rao (1965). On two simple schemes of unequal probability sampling without replacement. Journal of the Indian Statistical Association, 3,

23 M.R. Sampford (1967). On sampling without replacement with unequal probabilities of selection. Biometrika, 54, C.E. Särndal (2007). The calibration approach in survey theory and practice. Survey Methodology, 33, C.E. Särndal & R.L. Wright (1984). Cosmetic form of estimators in survey sampling. Scandinavian Journal of Statistics, 11, C.E. Särndal, B. Swensson & J.H. Wretman (1992). Model-assisted Survey Sampling. New York: Springer-Verlag. Z. Tan (2006). A distributional approach for causal inference using propensity scores. Journal of the American Statistical Association, 101, Z. Tan (2010). Bounded, efficient and doubly robust estimation with inverse weighting. Biometrika, 97, Z. Tan (2013). Simple design-efficient calibration estimators for rejective and highentropy sampling. Biometrika, 100, C. Wu (2004). Weighted empirical likelihood inference. Statistics & Probability Letters, 66, C. Wu & J.N.K. Rao (2006). Pseudo-empirical likelihood ratio confidence intervals for complex surveys. The Canadian Journal of Statistics, 34, C. Wu & J.N.K. Rao (2010). Bootstrap procedures for the pseudo empirical likelihood method in sample surveys. Statistics and Probability Letters, 80,

24 Table 1: Relative Root Mean Square Error ( 10 3 ) of Point Estimators (Study I) q i ρ π mm HT GREG-1 GREG-2 GPEL-1 GPEL π 1 i

25 Table 2: Coverage Probabilities and Average Length of 95% CI (Study I; Calibrated over x i ) q i ρ π mm HT(NT) GREG(NT) GPEL(Ĉ) GPEL(C = 1) GPEL(Boot) AL U CP L AL U CP L AL U CP L AL U CP L π 1 i AL U CP L AL U CP L AL U CP L AL U CP L

26 Table 3: Coverage Probabilities and Average Length of 95% CI (Study I; Calibrated over (x i, π i, 1)) q i ρ π mm HT(NT) GREG(NT) GPEL(Ĉ) GPEL(C = 1) GPEL(Boot) AL U CP L AL U CP L AL U CP L AL U CP L π 1 i AL U CP L AL U CP L AL U CP L AL U CP L

27 Table 4: Relative Root Mean Square Error ( 10 3 ) of Point Estimators (Study II) q i π mm HT GREG-1 GREG-2 GPEL-1 GPEL π 1 i

28 Table 5: Coverage Probabilities and Average Length of 95% CI (Study II) CV q i π mm HT(NT) GREG(NT) GPEL(Ĉ) GPEL(C = 1) GPEL(Boot) x i AL U CP L AL U CP L π 1 i AL U CP L AL U CP L (x i, π i, 1) AL U CP L AL U CP L π 1 i AL U CP L AL U CP L

Generalized pseudo empirical likelihood inferences for complex surveys

Generalized pseudo empirical likelihood inferences for complex surveys The Canadian Journal of Statistics Vol. 43, No. 1, 2015, Pages 1 17 La revue canadienne de statistique 1 Generalized pseudo empirical likelihood inferences for complex surveys Zhiqiang TAN 1 * and Changbao

More information

Simple design-efficient calibration estimators for rejective and high-entropy sampling

Simple design-efficient calibration estimators for rejective and high-entropy sampling Biometrika (202), 99,, pp. 6 C 202 Biometrika Trust Printed in Great Britain Advance Access publication on 3 July 202 Simple design-efficient calibration estimators for rejective and high-entropy sampling

More information

Empirical Likelihood Methods

Empirical Likelihood Methods Handbook of Statistics, Volume 29 Sample Surveys: Theory, Methods and Inference Empirical Likelihood Methods J.N.K. Rao and Changbao Wu (February 14, 2008, Final Version) 1 Likelihood-based Approaches

More information

A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions

A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions Y.G. Berger O. De La Riva Torres Abstract We propose a new

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Calibration estimation in survey sampling

Calibration estimation in survey sampling Calibration estimation in survey sampling Jae Kwang Kim Mingue Park September 8, 2009 Abstract Calibration estimation, where the sampling weights are adjusted to make certain estimators match known population

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson Department of Statistics Stockholm University Introduction

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Weight calibration and the survey bootstrap

Weight calibration and the survey bootstrap Weight and the survey Department of Statistics University of Missouri-Columbia March 7, 2011 Motivating questions 1 Why are the large scale samples always so complex? 2 Why do I need to use weights? 3

More information

Empirical Likelihood Inference for Two-Sample Problems

Empirical Likelihood Inference for Two-Sample Problems Empirical Likelihood Inference for Two-Sample Problems by Ying Yan A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Statistics

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st )

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Melike Oguz-Alper Yves G. Berger Abstract The data used in social, behavioural, health or biological

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-13-244R2 Title Examining some aspects of balanced sampling in surveys Manuscript ID SS-13-244R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.2013.244 Complete

More information

Empirical Likelihood Methods for Pretest-Posttest Studies

Empirical Likelihood Methods for Pretest-Posttest Studies Empirical Likelihood Methods for Pretest-Posttest Studies by Min Chen A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in

More information

Pseudo-empirical likelihood ratio confidence intervals for complex surveys

Pseudo-empirical likelihood ratio confidence intervals for complex surveys The Canadian Journal of Statistics 359 Vol. 34, No. 3, 2006, Pages 359 375 La revue canadienne de statistique Pseudo-empirical likelihood ratio confidence intervals for complex surveys Changbao WU and

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson 1 Introduction When planning the sampling strategy (i.e.

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total. Abstract

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total.   Abstract NONLINEAR CALIBRATION 1 Alesandras Pliusas 1 Statistics Lithuania, Institute of Mathematics and Informatics, Lithuania e-mail: Pliusas@tl.mii.lt Abstract The definition of a calibrated estimator of the

More information

ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MODEL-CALIBRATED PSEUDO EMPIRICAL LIKELIHOOD METHOD

ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MODEL-CALIBRATED PSEUDO EMPIRICAL LIKELIHOOD METHOD Statistica Sinica 12(2002), 1223-1239 ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MOD-CALIBRATED PSEUDO EMPIRICAL LIKIHOOD METHOD Jiahua Chen and Changbao Wu University of Waterloo Abstract:

More information

Adjusted Empirical Likelihood for Long-memory Time Series Models

Adjusted Empirical Likelihood for Long-memory Time Series Models Adjusted Empirical Likelihood for Long-memory Time Series Models arxiv:1604.06170v1 [stat.me] 21 Apr 2016 Ramadha D. Piyadi Gamage, Wei Ning and Arjun K. Gupta Department of Mathematics and Statistics

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance

More information

The R package sampling, a software tool for training in official statistics and survey sampling

The R package sampling, a software tool for training in official statistics and survey sampling The R package sampling, a software tool for training in official statistics and survey sampling Yves Tillé 1 and Alina Matei 2 1 Institute of Statistics, University of Neuchâtel, Switzerland yves.tille@unine.ch

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/2/03) Ed Stanek Here are comments on the Draft Manuscript. They are all suggestions that

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

Admissible Estimation of a Finite Population Total under PPS Sampling

Admissible Estimation of a Finite Population Total under PPS Sampling Research Journal of Mathematical and Statistical Sciences E-ISSN 2320-6047 Admissible Estimation of a Finite Population Total under PPS Sampling Abstract P.A. Patel 1* and Shradha Bhatt 2 1 Department

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

Estimation of change in a rotation panel design

Estimation of change in a rotation panel design Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS028) p.4520 Estimation of change in a rotation panel design Andersson, Claes Statistics Sweden S-701 89 Örebro, Sweden

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Test for Discontinuities in Nonparametric Regression

Test for Discontinuities in Nonparametric Regression Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

An Information Criteria for Order-restricted Inference

An Information Criteria for Order-restricted Inference An Information Criteria for Order-restricted Inference Nan Lin a, Tianqing Liu 2,b, and Baoxue Zhang,2,b a Department of Mathematics, Washington University in Saint Louis, Saint Louis, MO 633, U.S.A. b

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

Empirical likelihood inference for a common mean in the presence of heteroscedasticity

Empirical likelihood inference for a common mean in the presence of heteroscedasticity The Canadian Journal of Statistics 45 Vol. 34, No. 1, 2006, Pages 45 59 La revue canadienne de statistique Empirical likelihood inference for a common mean in the presence of heteroscedasticity Min TSAO

More information

Optimal Calibration Estimators Under Two-Phase Sampling

Optimal Calibration Estimators Under Two-Phase Sampling Journal of Of cial Statistics, Vol. 19, No. 2, 2003, pp. 119±131 Optimal Calibration Estimators Under Two-Phase Sampling Changbao Wu 1 and Ying Luan 2 Optimal calibration estimators require in general

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

Propensity score adjusted method for missing data

Propensity score adjusted method for missing data Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

A new resampling method for sampling designs without replacement: the doubled half bootstrap

A new resampling method for sampling designs without replacement: the doubled half bootstrap 1 Published in Computational Statistics 29, issue 5, 1345-1363, 2014 which should be used for any reference to this work A new resampling method for sampling designs without replacement: the doubled half

More information

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful? Journal of Modern Applied Statistical Methods Volume 10 Issue Article 13 11-1-011 Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

The Effective Use of Complete Auxiliary Information From Survey Data

The Effective Use of Complete Auxiliary Information From Survey Data The Effective Use of Complete Auxiliary Information From Survey Data by Changbao Wu B.S., Anhui Laodong University, China, 1982 M.S. Diploma, East China Normal University, 1986 a thesis submitted in partial

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Bootstrap and Parametric Inference: Successes and Challenges

Bootstrap and Parametric Inference: Successes and Challenges Bootstrap and Parametric Inference: Successes and Challenges G. Alastair Young Department of Mathematics Imperial College London Newton Institute, January 2008 Overview Overview Review key aspects of frequentist

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Statistica Sinica 22 (2012), 777-794 doi:http://dx.doi.org/10.5705/ss.2010.238 BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Desislava Nedyalova and Yves Tillé University of

More information

A JACKKNIFE VARIANCE ESTIMATOR FOR SELF-WEIGHTED TWO-STAGE SAMPLES

A JACKKNIFE VARIANCE ESTIMATOR FOR SELF-WEIGHTED TWO-STAGE SAMPLES Statistica Sinica 23 (2013), 595-613 doi:http://dx.doi.org/10.5705/ss.2011.263 A JACKKNFE VARANCE ESTMATOR FOR SELF-WEGHTED TWO-STAGE SAMPLES Emilio L. Escobar and Yves G. Berger TAM and University of

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Binary choice 3.3 Maximum likelihood estimation

Binary choice 3.3 Maximum likelihood estimation Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation We explain here the various outputs from the maximum likelihood estimation procedure. Solution of the maximum likelihood

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA. Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky

COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA. Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky Summary The empirical likelihood ratio method is a general nonparametric

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract Far East J. Theo. Stat. 0() (006), 179-196 COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS Department of Statistics University of Manitoba Winnipeg, Manitoba, Canada R3T

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

A Note on the Asymptotic Equivalence of Jackknife and Linearization Variance Estimation for the Gini Coefficient

A Note on the Asymptotic Equivalence of Jackknife and Linearization Variance Estimation for the Gini Coefficient Journal of Official Statistics, Vol. 4, No. 4, 008, pp. 541 555 A Note on the Asymptotic Equivalence of Jackknife and Linearization Variance Estimation for the Gini Coefficient Yves G. Berger 1 The Gini

More information

Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis

Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis The Canadian Journal of Statistics Vol.?, No.?, 2006, Pages???-??? La revue canadienne de statistique Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint

More information

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore AN EMPIRICAL LIKELIHOOD BASED ESTIMATOR FOR RESPONDENT DRIVEN SAMPLED DATA Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore Mark Handcock, Department

More information

F. Jay Breidt Colorado State University

F. Jay Breidt Colorado State University Model-assisted survey regression estimation with the lasso 1 F. Jay Breidt Colorado State University Opening Workshop on Computational Methods in Social Sciences SAMSI August 2013 This research was supported

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring A. Ganguly, S. Mitra, D. Samanta, D. Kundu,2 Abstract Epstein [9] introduced the Type-I hybrid censoring scheme

More information

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan

More information

Design and Estimation for Split Questionnaire Surveys

Design and Estimation for Split Questionnaire Surveys University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2008 Design and Estimation for Split Questionnaire

More information

A New Two Sample Type-II Progressive Censoring Scheme

A New Two Sample Type-II Progressive Censoring Scheme A New Two Sample Type-II Progressive Censoring Scheme arxiv:609.05805v [stat.me] 9 Sep 206 Shuvashree Mondal, Debasis Kundu Abstract Progressive censoring scheme has received considerable attention in

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information