Generalized pseudo empirical likelihood inferences for complex surveys

Size: px
Start display at page:

Download "Generalized pseudo empirical likelihood inferences for complex surveys"

Transcription

1 The Canadian Journal of Statistics Vol. 43, No. 1, 2015, Pages 1 17 La revue canadienne de statistique 1 Generalized pseudo empirical likelihood inferences for complex surveys Zhiqiang TAN 1 * and Changbao WU 2 * 1 Department of Statistics, Rutgers University, Piscataway, NJ 08854, U.S.A. 2 Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1 Key words and phrases: Auxiliary information; calibration techniques; confidence intervals; Kullback Leibler distance; survey design. MSC 2010: Primary 62D05; secondary 62G09 Abstract: We consider generalized pseudo empirical likelihood inferences for complex surveys. The method is based on a weighted version of the Kullback Leibler (KL distance for calibration estimation (Deville & Särndal, 1992 and includes the pseudo empirical likelihood estimator (Chen & Sitter, 1999; Wu & Rao, 2006 and the calibrated likelihood estimator (Tan, 2013 as special cases. We show that a suitably formulated empirical likelihood ratio-type statistic follows asymptotically a scaled chi-square distribution, which extends the main result in Wu & Rao (2006 and makes the likelihood ratio-type confidence intervals available for calibration estimation using arbitrary choices of the weighting factor in the weighted KL distance. We further show that the scaling factor for the scaled chi-square distribution can be circumvented either through a particular choice of the weighting factor for the KL distance or using a bootstrap method. The proposed bootstrap procedure is justified for single-stage sampling designs with negligible sampling fractions. Finite sample performances of confidence intervals constructed using our proposed methods are investigated and compared with existing ones through two simulation studies. The Canadian Journal of Statistics 43: 1 17; Statistical Society of Canada Résumé: Les auteurs considèrent l usage de la pseudo-vraisemblance empirique généralisée pour procéder à l inférence dans le cadre d une enquête complexe. Leur méthode est basée sur l estimation par calibration selon une version pondérée de la divergence de Kullback Leibler (KL (Deville et Särndal, L estimateur basé sur la vraisemblance empirique (Chen et Sitter, 1999; Wu et Rao, 2006 et celui basé sur la vraisemblance calibrée (Tan, 2013 en sont des cas particuliers. Les auteurs montrent qu en l exprimant convenablement, une statistique basée sur un ratio de vraisemblance suit une loi du chi deux à un facteur multiplicatif près, généralisant le résultat principal de Wu et Rao (2006, et permettant de calculer des intervalles de confiance basés sur le rapport de vraisemblance pour les estimateurs de calibration avec un choix arbitraire de poids dans la divergence de KL pondérée. Les auteurs montrent également qu il est possible d éviter le facteur multiplicatif de la loi asymptotique par un choix approprié de poids ou à l aide d une procédure bootstrap qui est justifiée pour un plan d expérience à un niveau ayant une fraction d échantillonnage négligeable. Les auteurs évaluent la performance des intervalles de confiance issus de leur méthode en les comparant à ceux obtenus par les méthodes existantes dans le cadre de deux études de simulation. La revue canadienne de statistique 43: 1 17; Société statistique du Canada 1. INTRODUCTION Calibration is a popular inference tool for analysis of complex surveys. It originates from the idea of benchmarking when population totals of certain auxiliary variables are known and used to form benchmark constraints. The method has gained significant popularity since the work of Deville & * Author to whom correspondence may be addressed. cbwu@uwaterloo.ca or ztan@stat.rutgers.edu 2015 Statistical Society of Canada / Société statistique du Canada

2 2 TAN AND WU Vol. 43, No. 1 Särndal (1992. Calibration estimators are closely related to the generalized regression estimators. Some of the important developments for regression estimators, such as variance estimation techniques, can also be used for calibration estimators. Fuller (2002 provides an excellent review on regression estimation and Särndal (2007 contains a thorough review on calibration techniques. The conventional route for inferences with calibration methods is to first compute the point estimator with an estimated variance, and then use the standard Z-statistic based on normal approximations to construct confidence intervals or conduct statistical tests. Confidence intervals under this approach are forced to be symmetric around the point estimator and are not necessarily confined within the parameter space. There have been significant developments on empirical likelihood (EL methods in non-survey statistics (Owen, 1988, Progress has also been made on using the empirical likelihood method for complex surveys. See, for instance, Chen & Sitter (1999, Wu & Rao (2006, and Chen & Kim (2014, among others. The most attractive feature of the empirical likelihood approach is the data-driven, range-respecting confidence intervals based on the empirical likelihood ratio statistic. This property is not enjoyed by the conventional calibration method with Wald-type confidence intervals. Suppose that U ={1, 2,...,N} is the set of N units for the finite population, with (y i, x i being the values of the study variable y and the vector of auxiliary variables x attached to unit i. Lett y = N i=1 y i be the parameter of interest, and t x = N i=1 x i be the known population totals. Let μ y = N 1 t y and μ x = N 1 t x be the corresponding population means. Let S beaset of n sampled units and {(y i, x i :i S} be the survey data. Let π i = P(i S be the first order inclusion probabilities and d i = πi 1 be the basic design weights. The calibration estimator of t y is computed as ˆt CAL = ŵiy i, where the ŵ i are the calibrated weights obtained by minimizing a distance measure G(d, w = G i (d i,w i between the w i an d the d i subject to the set of calibration equations (also called benchmark constraints w i x i = t x. (1 The most commonly used distance measure is the chi-squared distance specified by G i (d i,w i = (w i d i 2 /(q i d i, where the q i are the pre-specified constants. It is well-known that, under the chi-squared distance, the resulting calibration estimator ˆt CAL is algebraically identical to the generalized regression estimator (Särndal, Swensson, & Wretman, Under the conventional calibration approach (Deville & Särndal, 1992, confidence intervals on t y reply on asymptotic normality of ˆt CAL and are constructed through the standardized Z-statistics (ˆt { CAL t y / v(ˆtcal } 1/2, where v(ˆt CAL is a consistent estimator of the variance of ˆt CAL. The role of q i used in the distance measure does not affect the consistency of the calibration estimator but it has an impact on the variance of the estimator. The choice q i = πi 1 1 can lead to more efficient calibration estimators (Tan, 2010, A closely related research topic is the design-optimal regression estimator; see, for instance, Fuller & Isaki (1981, Montanari (1987, Rao (1994, Berger, Tirari, & Tillé (2003, Chen & Kim (2014, among others. Fuller (2009 considered model-optimal design-consistent estimators.

3 2015 GENERALIZED PSEUDO EMPIRICAL LIKELIHOOD INFERENCES 3 For complex survey data, Chen & Sitter (1999 proposed to use a pseudo empirical (log likelihood l(p = d i log(p i, (2 where p = (p 1,...,p n T is the discrete probability measure over the n sampled units. The maximum pseudo-el estimator of the population mean μ y is computed as ˆμ PEL = ˆp iy i, where the ˆp i maximize the pseudo-el function l(p subject to p i = 1 and the set of constraints p i x i = μ x. (3 Chen & Sitter (1999 showed that the estimator ˆμ PEL is asymptotically equivalent to the calibration estimator N 1ˆt CAL with the choice of q i = 1. Wu & Rao (2006 showed that the pseudo-el ratio statistic on μ y, adjusted by a scaling factor involving the design effect, has an asymptotic χ 2 distribution with one degree of freedom. Consequently, confidence intervals on μ y based on the pseudo-el ratio statistic can be constructed. There are two major gaps between the conventional calibration method and the pseudo empirical likelihood method. First, the pseudo-el method is designed for the population mean μ y and relies on the constraints (3 which uses the known population means μ x. The method cannot be directly applied to scenarios where the population size N is unknown and only the population totals t x are available. Second, the pseudo-el approach cannot entertain a general choice of the weight factor, q i, which is an important tool for achieving design-optimal estimation as mentioned earlier. Recently, Tan (2010, 2013 developed a calibrated likelihood method by exploiting a connection between survey calibration and missing data problems. The method can be understood in two steps. Let R i be the sample inclusion indicator, i.e., R i = 1ifand R i = 0 otherwise. The first step is to treat {(x i,y i,r i :i U} as an iid sample from a joint distribution of (x,y,r and derive a bona fide empirical likelihood estimator (Qin & Lawless, 1994 for E(y = E{π 1 (xry}, subject to the moment constraints 0 = E{π 1 (xrx x}, based on the observed data {(x i,r i y i,r i :i U}, where π(x = P(R = 1 x,y is assumed to be free of y. This empirical likelihood estimator of E(y takes the usual form N 1 ŵiy i for some weights ŵ i, but the calibration equations are generally violated, i.e., ŵix i t x. For the second step, Tan (2010 proposed a modification such that the calibration equations are satisfied but without affecting the first-order asymptotic variance. As observed in Tan (2013, the calibrated likelihood estimator turns out to be algebraically equivalent to ˆt CAL with D(w, d specified as a weighted Kullback Leibler distance and the weight factor q i set to πi 1 1. The calibrated likelihood estimator (Tan, 2010 and, similarly, the calibrated regression estimator (Tan, 2006 are shown in Tan (2013 to be asymptotically optimal under rejective or high-entropy sampling designs when π i is included as a calibration variable in x i. These two estimators are simpler than the usual optimal regression estimator involving second-order inclusion probabilities (Fuller & Isaki, 1981; Montanari, 1987; Rao, See also Chen & Kim (2014 for related results but under negligible sampling fractions. In this paper, we consider generalized pseudo empirical likelihood inferences for complex surveys. The method is based on a weighted version of the Kullback Leibler (KL distance for calibration estimation (Deville & Särndal, 1992 and includes the pseudo empirical likelihood estimator (Chen & Sitter, 1999; Wu & Rao, 2006 and the calibrated likelihood estimator (Tan, 2013 as special cases. We show that a suitably formulated empirical likelihood ratio-type statistic follows asymptotically a scaled chi-square distribution, which extends the main result in Wu & Rao

4 4 TAN AND WU Vol. 43, No. 1 (2006 and makes the likelihood ratio-type confidence intervals available for calibration estimation using arbitrary choices of the weighting factor in the weighted KL distance. We further show that the scaling factor for the scaled chi-square distribution can be circumvented either through a particular choice of the weighting factor for the KL distance or using a bootstrap method. The proposed bootstrap procedure is justified for single-stage sampling designs with negligible sampling fractions. The rest of the paper is organized as follows. Main results on the generalized pseudo empirical likelihood method are presented in Section 2. The proposed bootstrap procedure is described in Section 3. In Section 4, we report results from two simulation studies, one based on a synthetic finite population and the other using a Statistics Canada survey data set. Some concluding remarks and discussions are given in Section 5. Proofs of the major results and justification of the bootstrap method are given in the Appendix. 2. GENERALIZED PSEUDO EMPIRICAL LIKELIHOOD METHOD 2.1. Weighted Kullback Leibler Distance Based Calibration Estimators Kullback Leibler distance is a measure of divergence between two distributions. It was first described by Kullback & Leibler (1951 as a loss function in the context of information theory, and then further discussed by Kullback (1959. For two discrete probability measures f = (f 1,...,f T and g = (g 1,...,g n T, there are two types of Kullback Leibler distance: KL(f, g = n i=1 f i log(f i /g i and KL(g, f = n i=1 g i log(g i /f i. When discussing confidence intervals for iid data, taking f to be the empirical measure with f i = 1/n and g i to be another probability measure for the data, DiCiccio & Romano (1990 called KL(f, g the forward Kullback Leibler distance and KL(g, f the backward Kullback Leibler distance. Unfortunately, neither KL(f, g nor KL(g, f can be used directly as a distance measure for calibration estimation, since G i (f i,g i = f i log(f i /g i does not guarantee that G i (f i,g i 0 for all i. A simple modification is to use G i (f i,g i = f i log(f i /g i f i + g i. In this case G i (f i,g i = f i {log(g i /f i g i /f i + 1} 0for all i, since log(x x for any x>0. For the two sets of weights d = (d 1,...,d n T and w = (w 1,...,w n T, we consider the modified forward Kullback Leibler distance between w and d, weighted by (q 1,...,q n : EL(d, w = { ( qi 1 wi d i log w i + d i }. d i This is also called the minimum entropy distance by Deville & Särndal (1992. The notation EL(d, w indicates its connection to empirical likelihood. For independent but not identically distributed data where d i = n 1 and (w 1,...,w n are replaced by the probability measure (p 1,...,p n over the sample, EL(d, w was discussed in Wu (2004 as the weighted empirical log-likelihood function, with the q-weights specified through the variance function. If we let q i = 1 and impose the constraint n i=1 w i = N, then EL(d, w = n i=1 d i log(p i + C, where p i = w i /N and C is a constant not involving p i. In this case, minimizing EL(d, w subject to a set of constraints on w i is equivalent to maximizing the pseudo-el function l(p = n i=1 d i log(p i subject to the same set of constraints on p i. In other words, the pseudo-el approach of Chen & Sitter (1999 and Wu & Rao (2006 is a special case of inferences based on the modified forward Kullback Leibler distance EL(d, w. We use the term generalized pseudo empirical likelihood (GPEL to denote calibration estimation under the distance EL(d, w. The GPEL estimator of t y is given by ˆt EL = ŵiy i, where the weights ŵ i minimize EL(d, w subject to (1. If N is known, then the constraint w i = N should be included. This amounts to including 1 as the first component of x i and N as the first component of t x in the calibration equations (1. It can be shown by the standard Lagrange multiplier

5 2015 GENERALIZED PSEUDO EMPIRICAL LIKELIHOOD INFERENCES 5 method that d i ŵ i =, (4 1 + q i x T i ˆλ where ˆλ is a solution to d i x i 1 + q i x T i ˆλ t x = 0. (5 It should be noted that the modified backward Kullback Leibler distance between w and d, weighted by the pre-specified q-weights (q 1,...,q n, is given by: ET(d, w = { ( qi 1 wi w i log w i + d i }. d i The notation ET comes from the term exponential tilting, since minimizing ET(d, w with respect to w subject to constraints (1 results in calibration weights given by w i = d i g i, where g i = exp ( λ x i q i and λ is determined by constraints (1. The distance measure ET(d, w was first mentioned by Deville & Särndal (1992. Folsom (1991 provides an early example on exponential weight adjustment. Kim (2010 contains further discussions on calibration estimation using exponential titling. Now consider the choice q i = πi 1 1 (Tan, 2010, The distance EL(d, w is equal to (1 π i 1 {log(w i π i w i } up to an additive constant. The resulting calibration weights are given by ŵ i ={π i + (1 π i x T i ˆλ} 1, where ˆλ is the solution to x i/{π i + (1 π i x T i ˆλ} t x = 0. The resulting calibration estimator of μ y is given by ˆμ EL = 1 y i, (6 N π i + (1 π i x T i ˆλ which is exactly the same as the calibrated likelihood estimator of Tan (2010, The use of πi 1 1 as a weight also appeared previously in Brewer (1999 on cosmetic calibration and Berger, Tirari, & Tillé (2003 on optimal regression estimation. See further discussions after Corollary 1. An interesting interpretation of the choice q i = πi 1 1 is as follows. Let I i = 1ifi S and I i = 0ifi/ S, then E p (I i /π i = 1 and V p (I i /π i = πi 1 1. Throughout, E p ( and V p ( refer to expectation and variance under the probability sampling design. In other words, the choice q i = πi 1 1 reflects the variation of selecting the ith unit into the sample under the survey design. Another benefit of setting q i = πi 1 1 can be seen from the property that q i 0if π i 1. If the inclusion probability of a unit is close to 1, then this unit is substantially downweighted (or completely removed if π i = 1 in the calibration process. This seems to be sensible from a design perspective, because the uncertainty associated with unit i is very small if π i 1, and in this case we should force w i d i 1. In particular, this property may lead to substantial variance reduction, when the linear relationship of y i given x i is violated mostly in the region where π i 1, as seen in Tan (2013, Section Generalized Pseudo Empirical Likelihood Ratio Confidence Intervals The point estimator ˆt EL falls in the general class of calibration estimators (Deville & Särndal, 1992, with the distance measure G(d, w specified as the modified Kullback Liebler distance EL(d, w. We now establish an important new result for constructing confidence intervals based on a GPEL ratio statistic similar to the pseudo empirical likelihood ratio statistic in Wu &Rao

6 6 TAN AND WU Vol. 43, No. 1 (2006. We assume that the finite population and the survey design satisfy the same regularity conditions C1 C5 described in Wu & Rao (2006. In addition, we assume that C6. The q-weights satisfy N 1 N i=1 q 2 i = O(1. Under conditions C1 C6, we have that N 1 N i=1 q i x i x T i = O(1 and N 1 N i=1 q i x i y i = O(1. Let ŵ = (ŵ 1,...,ŵ n, where ŵ i are computed by (4 and (5. By standard asymptotic theory of calibration estimation (Deville & Särndal, 1992, we have ˆλ = O p (n 1/2 where ( ˆλ = d i q i x i x T i 1 ( ˆt x t x + o p (n 1/2, (7 and ˆt x = d ix i. This leads to the following asymptotic expansion: ˆt EL = ŵiy i = ˆt GREG + o p (Nn 1/2, where ˆt GREG = ˆt y + ˆB T( t x ˆt x (8 with ˆt y = d iy i and ( 1 ˆB = d i q i x i x T i d i q i x i y i. (9 The estimator (8 is known as the generalized regression estimator for a general choice q i (Särndal, Swensson, & Wretman, Let w(θ = ( w 1 (θ,..., w n (θ, where the weights w i (θ minimize EL(d, w subject to (1 and w i y i = θ (10 for a given θ. The GPEL ratio statistic for θ is defined as r(θ = EL(d, ŵ EL(d, w(θ. We have the following result on the asymptotic distribution of r(θ. Theorem 1. Under regularity conditions C1 C6, the adjusted GPEL ratio statistic 2r(θ/C converges in distribution to a χ 2 random variable with one degree of freedom when θ = t y. The scaling constant C is given by /( N C = V p (ˆη q i e 2 i, (11 where ˆη = ( Ni=1 d ie i, e i = y i B T x i, and B = q i x i x T i consistently estimated by ˆB defined in (9. i=1 1 ( Ni=1 q i x i y i, which can be In practice, the scaling factor C needs to be estimated by a consistent estimator Ĉ, which involves variance estimation for ˆη. This can be handled similarly as in Wu & Rao (2006. A bootstrap procedure described in Section 3 can also be used to circumvent the estimation of C for

7 2015 GENERALIZED PSEUDO EMPIRICAL LIKELIHOOD INFERENCES 7 single-stage sampling designs with negligible sampling fractions. For rejective or high-entropy sampling designs, estimation of C is also not required, as shown in Corollaries 1 and 2 below. An interesting special case of Theorem 1 is obtained for the calibrated likelihood estimator (6 of Tan (2013 under rejective sampling, using the weight factor q i = πi 1 1 and including π i as a calibration variable in x i. As defined in Hajek (1964, rejective sampling is Poisson sampling conditional on a fixed sample size. For example, simple random sampling without replacement corresponds to rejective sampling with constant inclusion probabilities. In this case the scaling constant C is asymptotically equal to 1. We assume that lim inf N N 1 N i=1 (πi 1 1e 2 i > 0. Corollary 1. Let q i = πi 1 1 and assume that π i is included as a component of x i. Under rejective sampling and the regularity conditions stated in Theorem 1 of Tan (2013, we have lim N C = 1 and 2r(θ converges in distribution to a χ 2 random variable with one degree of freedom when θ = t y. A heuristic explanation for the simplification of C is as follows; see also Tan (2013. Under Poisson sampling, V p (ˆη = N i=1 (πi 1 1e 2 i and hence C is exactly equal to 1. But rejective sampling of size n is defined as Poisson sampling conditional on a fixed sample size n, that is, d iπ i = n. Hence under rejective sampling, V p (ˆη is asymptotically equal to the residual" variance { N i=1 (πi 1 Ni=1 } 1 { Ni=1 } 1(e i bπ i 2, with b = π i (1 π i (1 π i e i, which then reduces to 0 by the definition of e i and the fact that π i is included as a component of x i. Such an argument is also implicit in Berger, Tirari, & Tillé (2003 on optimal regression estimation. In fact, under single-stage rejective sampling, the estimator of Berger et al. reduces to the same estimator, up to some minor difference, as the calibrated regression estimator of Tan (2013, taking the form of ˆt GREG in (8 with q i = πi 1 1. Incidentally, when both using q i = πi 1 1 and including π i in x i, the calibrated regression estimator can also be expressed in the cosmetic form of linear prediction estimators (Särndal & Wright, 1984, i.e., ˆt GREG = y i + i S ˆB T x i, as shown in Tan (2013. Fuller (2009 and Park & Kim (2014 also contain discussions on the topic. These choices of q i and x i satisfy a general construction of cosmetic calibration estimators in Brewer (1999. In Brewer s notation, Z s is taken here to be diagonal with diagonal elements π i. The condition Z s 1 n = X s α holds because π i s constitute a column of X s. But, in general, Brewer s (1999 proposal does not imply setting q i = πi 1 1. Similarly as in Tan (2013, Corollary 1 can be generalized from rejective sampling to other high-entropy sampling methods such that the Kullback Leibler divergence from rejective sampling tends to 0. In particular, Rao-Sampford sampling method (Rao, 1965; Sampford, 1967 is an example of high-entropy sampling provided that n i=1 π i (1 π i (Berger, 1998, which is already implied by the regularity conditions in Tan (2013, Theorem 1. Corollary 2. The same result as in Corollary 1 holds if rejective sampling procedure is replaced by the Rao-Sampford sampling procedure Computational Procedures for GPEL The basic computational problem is to minimize EL(d, w with respect to w = (w 1,...,w n subject to constraint (1. The resulting ŵ i is given by (4 with the Lagrange multiplier ˆλ being the solution to (5. The key to our computational algorithms is that the required constrained

8 8 TAN AND WU Vol. 43, No. 1 minimization with respect to w is a dual problem of maximizing K(λ = qi 1 d i log ( 1 + q i x T i λ tx T λ with respect to λ within the set (λ ={λ :1+ q i x T i λ > 0,i S}, since (5 is equivalent to 1 ( λ = λ K(λ = Note that K(λ is a concave function of λ, since the matrix 2 ( λ = 2 λ λ T K(λ = d i x i 1 + q i x T i ˆλ t x = 0. (12 d i q i x i x T i (1 + q i x T i λ2 (13 is negative definite. This duality property was also observed in Tan (2010 for the calibrated likelihood estimator, where, up to an additive constant, K(λ = log{π i + (1 π i x T i λ} 1 π i t T x λ. The solution to (5 can be found using the modified Newton Raphson procedures of Chen, Sitter, & Wu (2002 with 1 (λ and 2 (λ defined in (12 and ( A BOOTSTRAP PROCEDURE Results from Theorem 1 can be used to construct 1 α level confidence intervals for the population total θ = t y in the form of {θ 2r(θ/C<χ1 2(α}, where χ2 1 (α is the upper 100αth percentile from the χ1 2 distribution. Under an arbitrary unequal probability sampling design, the scaling constant C needs to be estimated, which involves variance estimation for ˆη. For single-stage unequal probability sampling designs with negligible sampling fractions, the scaling constant can be circumvented through a bootstrap calibration method. The bootstrap procedure also provides a useful alternative to the chi-square approximation for rejective or high-entropy sampling under Corollaries 1 and 2 where C can be replaced by Ĉ = 1. The bootstrap procedure introduced here is similar to the with-replacement bootstrap procedure described in Wu & Rao (2010 for the pseudo empirical likelihood method. The bootstrap calibrated 1 α level confidence intervals on θ = t y using the unscaled GPEL ratio statistic is constructed as {θ 2r(θ <b α }, where b α is the upper 100αth percentile from the sampling distribution of 2r(θ. The bootstrap procedure provides a Monte Carlo approximation to b α. The most crucial part of the bootstrap method is to treat the survey weights d i and the q- weights q i as part of the sample data. Let {(d i,q i, x i,y i,i S} be the original survey data set. Let t x be the known population totals for the x-variables and let ˆt EL = ŵiy i be the calibration estimator of t y using the distance measure EL(d, w. Our proposed bootstrap method consists of the following four steps: [1] Select a bootstrap sample S of size n from the original sample S using simple random sampling with replacement; denote the bootstrap sample data by {(di,q i, x i,y i,i S }. [2] Let the bootstrap version of EL(d, w be defined as EL (d, w = (q i 1{ d i log ( wi d i } w i + di.

9 2015 GENERALIZED PSEUDO EMPIRICAL LIKELIHOOD INFERENCES 9 [3] Calculate the GPEL ratio statistic r (θ = EL (d, ŵ EL (d, w(θ at θ = ˆt EL, where ŵ = (ŵ 1,...,ŵ n T maximize EL (d, w subject to w ix i = t x and w(θ = ( w 1 (θ,..., w n (θ T maximize EL (d, w subject to w ix i = t x and w iyi = ˆt EL. [4] Repeat Steps [1], [2], and [3] a large number of times, B, independently, to obtain the sequence 2r1 (θ,..., 2r B (θ, all at θ = ˆt EL.Letbα be the upper 100αth sample percentile from this sequence. The proposed bootstrap method can be formally justified for single-stage unequal probability sampling design with replacement; see the Appendix for details. The procedure also provides good approximations for single-stage unequal probability sampling designs without replacement if the sampling fraction is small. Treating survey designs with negligible sampling fractions as if the units are selected with replacement is a common practice in survey sampling for the purpose of variance estimation or other second order analysis. The bootstrap calibrated confidence interval on t y, constructed as {θ 2r(θ <bα }, has approximately correct asymptotic coverage probability at the 1 α level. 4. SIMULATION STUDIES We now report results from two simulation studies on the performances of the GPEL based estimators and GPEL ratio confidence intervals on a population total, with comparisons to the generalized regression estimators and the usual normal theory confidence intervals. Study I. The finite population of size N = 2,000 used for the simulation was generated from the model y i = β x i + 2z i + 0.5{x i I(x i < 2} z 1/2 i + σε i, where x i lognormal(0, 1, z i χ2 2, ε i N(0, 1, I( is the indicator function, β 0 was chosen such that y i 0 for i = 1,...,N. Two values of σ were used such that the correlation coefficients, ρ, between the response variable y i and the linear predictor β x i + 2z i + 0.5{x i I(x i < 2} z 1/2 i are 0.80 and 0.50, respectively. The finite population, once generated from the above model, was held fixed. Under this setting, the finite population correlation coefficients between y and x are respectively 0.46 and 0.30 for ρ = 0.80 and ρ = 0.50; the correlation coefficients between y and z are respectively 0.66 and 0.43 for the two corresponding values of ρ. Single-stage unequal probability samples of size n = 80 were taken from the finite population, with inclusion probabilities π i proportional to z i + c. Two values of c were considered such that π mm = π max /π min equals 200 and 20, respectively, where π max = max{π i,i= 1,...,N} and π min = min{π i,i= 1,...,N}. Rao-Sampford unequal probability sampling method (Rao, 1965; Sampford, 1967 was used in selecting the samples. Note that the sampling fraction is 80/2,000 = 4%, which is small, and the Rao-Sampford method has high entropy (Berger, It should also be noted that the second-order inclusion probabilities π ij can be computed exactly for the Rao-Sampford sampling method. We considered two choices of q-weights: q i = 1 and q i = πi 1 1. This gives a total of eight different scenarios with respect to the choices on ρ, π mm and q i. For each scenario, five point estimators of the population total t y were computed: (1 the basic Horvitz Thompson estimator (HT; (2 the generalized regression estimator calibrated over x i (GREG-1; (3 the generalized regression estimator calibrated over (x i,π i, 1 (GREG-2; (4 the GPEL based estimator calibrated over x i (GPEL-1; and (5 the GPEL based estimator calibrated over (x i,π i, 1 (GPEL-2.

10 10 TAN AND WU Vol. 43, No. 1 Table 1: Relative root mean square error ( 10 3 of point estimators (Study I. q i ρ π mm HT GREG-1 GREG-2 GPEL-1 GPEL π 1 i Performances of a point estimator ˆt y of the population total t y are evaluated in terms of simulated relative bias (RB and relative root mean square Error (RRMSE defined as RB = K 1 K k=1 {ˆty (k t y }/ ty and RRMSE = ( MSE 1/2 / t y, where ˆt y (k is the estimator computed from the kth simulated sample, MSE = K 1 K k=1 {ˆt y (k t y } 2, and K is the total number of simulation runs. All five estimators demonstrated negligible biases ( RB < 3% for all cases. Details are not included here to save space. The simulated values of RRMSE are summarized in Table 1. The results for the Horvitz Thompson estimator are reported from two independent simulations for the two choices of q-weights. It can be seen from the table that (i the design with less variable weights (π mm = π max /π min = 20 provides better results than the design with more variable weights (π mm = 200; (ii including the design variable (i.e., the inclusion probabilities π i and the constant 1 in the calibration equations gives significantly more accurate estimation; (iii the GPEL based calibration estimators are at least as efficient as the generalized regression estimators; and (iv the two choices of q-weights lead to similar results. A possible explanation for (iv is that the mean of y i given (x i,z i under the simulation model is only moderately nonlinear, depending mainly on x i instead of z i or π i. Using q i = πi 1 1 may lead to more noticeable gains of efficiency when the linear relationship is more seriously misspecified and the nonlinearity occurs in the region where π i is close to 1, as discussed in Section 2.1. We considered five methods for constructing confidence intervals on the population total t y : (1 the normal theory interval based on the Horvitz Thompson estimator, denoted as HT(NT; (2 the normal theory interval based on the generalized regression estimator, denoted as GREG(NT; (3 the profile GPEL ratio interval based on the scaled χ 2 distribution described in Theorem 1, denoted as GPEL(Ĉ, where Ĉ is the estimated scaling constant; (4 the profile GPEL ratio interval with C = 1 when q i = πi 1 1, denoted as GPEL(C = 1; and (5 the bootstrap calibrated GPEL ratio interval described in Section 3, denoted as GPEL(Boot, where B = 1,000 bootstrap samples were used for each simulation run. Let (ˆθ 1 (k, ˆθ 2 (k be a confidence interval on θ = t y obtained from the kth simulated sample using a particular method. Performances of the interval are measured by the (relative average length (AL, lower (L, and upper (U tail error rates, and coverage probability (CP, computed,

11 2015 GENERALIZED PSEUDO EMPIRICAL LIKELIHOOD INFERENCES 11 Table 2: Coverage probabilities and average length of 95% CI (Study I; calibrated over x i. q i ρ π mm HT (NT GREG (NT GPEL (Ĉ GPEL (C = 1 GPEL (Boot AL U CP L AL U CP L AL U CP L AL U CP L π 1 i AL U CP L AL U CP L AL U CP L AL U CP L respectively as AL = 1 K {ˆθ 2 (k ˆθ 1 (k }/ t y, K k=1 { 1 K L = I ( t y ˆθ 1 (k } 100, K k=1

12 12 TAN AND WU Vol. 43, No. 1 Table 3: Coverage probabilities and average length of 95% CI (Study I; calibrated over (x i,π i, 1. q i ρ π mm HT (NT GREG (NT GPEL (Ĉ GPEL (C = 1 GPEL (Boot AL U CP L AL U CP L AL U CP L AL U CP L π 1 i AL U CP L AL U CP L AL U CP L AL U CP L { 1 K U = I ( t y ˆθ 2 (k } 100, K k=1 { 1 K CP = I (ˆθ 1 (k t y ˆθ 2 (k } 100. K k=1 Note that L + CP + U = 100. Table 2 reports results on confidence intervals where only x i is used in calibration for GREG and GPEL. Table 3 summarizes results with (x i,π i, 1 used for calibration.

13 2015 GENERALIZED PSEUDO EMPIRICAL LIKELIHOOD INFERENCES 13 Table 4: Relative root mean square error ( 10 3 of point estimators (Study II. q i π mm HT GREG-1 GREG-2 GPEL-1 GPEL π 1 i Major observations from Tables 2 and 3 can be summarized as follows: (i the GREG(NT and GPEL(Ĉ intervals are associated with both greater average lengths and lower coverage probabilities in the two tables, under the design with more variable weights (π mm = 200 than under the design with less variable weights (π mm = 20. This demonstrates the challenges for dealing with highly variable sampling weights. (ii While the average lengths of GREG(NT and GPEL(Ĉ intervals are similar, the coverage probabilities of GPEL(Ĉ are consistently higher and closer to 95% than those of GREG(NT in the two tables. (iii including (π i, 1 in the calibration (i.e., Tables 2 and 3 significantly reduces average lengths for GREG(NT and GPEL(Ĉ intervals under the two designs of π mm, echoing the previous results on RRMSE. But the coverage probabilities for both methods decrease noticeably under the design with π mm = 200, although not so under the design with π mm = 20. (iv the GPEL(C = 1 intervals with q i = πi 1 1 seem to perform well even if π i is not included in calibration (Table 2. The method becomes almost identical to GPEL(Ĉ when (π i, 1 is included in calibration (Table 3; (v the bootstrap intervals GPEL(Boot perform remarkably well, in terms of coverage probabilities, for all the cases, especially under the design with π mm = 200. The average lengths are slightly inflated as compared to GREG(NT or GPEL(Ĉ. Study II. In this simulation study we used a real survey data set from the 2000 Statistics Canada Family Expenditure Survey for the province of Ontario. The data set contains N = 2248 observations, with measurements on x i : number of people in the household; z i : annual income; y i : total expenditure. Chen, Sitter, & Wu (2002 contains a detailed description of the data set. We treated the data as the finite population, and conducted the same types of simulation as in Study I. Once again, the Rao-Sampford sampling method was used and the sample size was set at n = 80. Both π mm = 200 and π mm = 20 are considered. Results are summarized in Tables 4 and 5. The first column in Table 5 indicates the calibration variables (CV used in related method (x i only versus (x i,π i, 1. Most of the observations from Study I remain true, except that the GREG(HT and the GPEL(Ĉ intervals have much better performances. Low coverage probabilities do not seem to be an issue with the current study. 5. CONCLUSION Calibration estimation using auxiliary information has been extensively studied in the survey literature. Choices among alternative approaches depend on (i the flexibility in obtaining efficient point estimators; (ii the efficiency and reliability of computational procedures; and (iii the capacity for drawing inferences beyond point estimation such as constructing confidence intervals or conducting hypothesis tests. The generalized pseudo empirical likelihood approach has shown advantages in all three aforementioned aspects. In practice, we recommend including π i as a calibration variable, and using the choice q i = πi 1 1 especially when y i and x i are suspected

14 14 TAN AND WU Vol. 43, No. 1 Table 5: Coverage probabilities and average length of 95% CI (Study II. CV q i π mm HT (NT GREG (NT GPEL (Ĉ GPEL (C = 1 GPEL (Boot x i AL U CP L AL U CP L π 1 i AL U CP L AL U CP L (x i,π i, AL U CP L AL U CP L π 1 i AL U CP L AL U CP L to have a strong nonlinear relationship. Confidence intervals can be obtained by using the adjusted GPEL ratio statistic or, when sampling fractions are negligible, by the bootstrap procedure. While inferences on population totals are the main focus in the current paper, extensions to parameters defined through general estimating equations, including regression and logistic regression coefficients, are the natural topic for further development. Moreover, extensions to multistage sampling designs and extensions to analyzing imputed survey data are currently under investigation.

15 2015 GENERALIZED PSEUDO EMPIRICAL LIKELIHOOD INFERENCES 15 APPENDIX Proof of Theorem 1. Note that ŵ i are computed by (4 and (5 and ŵ i /d i = 1/ ( 1 + q i x T i ˆλ. For u i = o(1, we have log ( 1 + u i = ui u2 i 2 + O(u3 i and (1 + u i 1 = 1 u i + u 2 i + O( u 3 i. Under conditions C1 C6, we have ˆλ = O p (n 1/2 and max q i x i =o p (n 1/2. It follows that u i = q i x T i ˆλ = o p (1 uniformly over all i S. This together with (7 leads to EL(ŵ, d = ( qi 1 d i {log 1 + q i x T i ˆλ ( q i x T i ˆλ } 1 1 = 1 2 ˆλ T ( d i q i x i x T i ˆλ + o p ( N n = 1 2 ( ( 1 T ( ( N ˆt x t x d i q i x i x T i ˆt x t x + o p n ( = 1 T N (ˆt x t x q i x i x T i 2 i=1 1 ( ( N ˆt x t x + o p. n To derive an asymptotic expansion for EL( w(θ, d with the additional constraint (10, we follow the same technique used in the proof of Theorem 2 of Wu & Rao (2006. For θ = t y, it can be shown that ( EL( w(θ, d = 1 T N (ˆt z t z q i z i z T i 2 i=1 1 ( ( N ˆt z t z + o p, n where z i = (x T i,y i T. The last expression remains valid if z i is defined by the linear transformation z i = (x T i,e i T, where e i = y i B T x i. Then N i=1 q i z i z T i is block-diagonal, and r(θ = EL(ŵ, d EL( w(θ, d / = 1 2 (ˆη N η2 q i e 2 i + o p i=1 where η = N i=1 e i = t y B T t x, ˆη = d ie i and ( N 1 N B = q i x i x T i q i x i y i. i=1 i=1 ( N, n Therefore, 2r(θ/Cconverges in distribution to a χ 2 random variable with one degree of freedom when θ = t y. Note that V p (ˆη = O(N 2 /n and C = O(N/n. Proof of Corollary 1. The result follows directly from the fact that V p ( e i/π i = Ni=1 (πi 1 1e 2 i + o(n by Tan (2013, Lemma 1 under rejective sampling.

16 16 TAN AND WU Vol. 43, No. 1 Justification of the Bootstrap Method. The first key result is from Theorem 1, which states that 2r(θ/C χ1 2 in distribution when θ = t y, where the scaling constant is given by C = V p (ˆη/ ( N i=1 q i e 2 i with ˆη = d ie i and e i = y i B T x i. Note that V p ( denotes the design-based variance. The second key result is the parallel development on the bootstrap version of the GPEL ratio statistic, following the exact steps used in the proof of Theorem 1, which shows that 2r (θ/c χ1 2 in distribution when θ = ˆt EL. The scaling constant is given by C = V (ˆη S /( q i d i e 2 i, where ˆη = d i e i, e i = yi ˆB T x i, and V ( S denotes the variance under the bootstrap sampling procedure, conditional on the original survey sample. Let z i = π i /n and z i = πi /n. It follows that d i = 1/πi = (1/z i /n and ˆη = n 1 r i where ri = e i /z i. Similarly, we also have d i = (1/z i /n and ˆη = n 1 r i where r i = e i /z i. Under the proposed with-replacement bootstrap procedure, we have V (ˆη S = Sr 2 /n, where Sr 2 = ( n 1 ri ˆη 2. If the original survey sample is selected by a single-stage unequal probability sampling design with replacement, then ˆη = n 1 e i/z i is the standard Hansen Hurwitz estimator. The design-based variance V p (ˆη can be unbiasedly estimated by n 1{ (n 1 1 (r i ˆη 2}. In this case, we have C/C 1asngets large, the bootstrap version 2r (θ and the original version 2r(θ follow asymptotically the same scaled χ 2 distribution. The bootstrap percentile bα is a consistent estimator of the true percentile b α. ACKNOWLEDGEMENTS The research of Z. Tan was supported by a grant from the Natural Science Foundation of United States. The research of C. Wu was supported by a grant from the Natural Sciences and Engineering Research Council of Canada. BIBLIOGRAPHY Berger, Y. G. (1998. Rate of convergence to normal distribution for the Horvitz-Thompson estimator. Journal of Statistical Planning and Inference, 67, Berger, Y. G., Tirari, E. H. M., & Tillé, Y. (2003. Towards optimal regression estimation in sample surveys. Australian & New Zealand Journal of Statistics, 45, Brewer, K. R. W. (1999. Cosmetic calibration with unequal probability sampling. Survey Methodology, 25, Chen, J. & Sitter, R. R. (1999. A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys. Statistica Sinica, 12, Chen, J., Sitter, R. R., & Wu, C. (2002. Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys. Biometrika, 89, Chen, S. & Kim, J. K. (2014. Population empirical likelihood for nonparametric inference in survey sampling. Statistica Sinica, 24, Deville, J. C. & Särndal, C. E. (1992. Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, DiCiccio, T. J. & Romano, J. P. (1990. Nonparametric confidence limits by resampling methods and least favorable families. International Statistical Review, 58, Folsom, R. E. (1991. Exponential and logistic weight adjustment for sampling and nonresponse error reduction. Proceedings of the Section on Social Statistics, American Statistical Association,

17 2015 GENERALIZED PSEUDO EMPIRICAL LIKELIHOOD INFERENCES 17 Fuller, W. A. (2002. Regression estimation for survey samples. Survey Methodology, 28, Fuller, W. A. (2009. Sampling Statistics, John Wiley & Sons, Inc., Hoboken, New Jersey. Fuller, W. A. & Isaki, C. T. (1981. Survey design under superpopulation models. In Current Topics in Survey Sampling, Krewski, D., Rao, J. N. K., & Platek, R., editors. Academic Press, New York, pp Hajek, J. (1964. Asymptotic theory of rejective sampling with varying probabilities from a finite population. Annals of Mathematical Statistics, 35, Kim, J. K. (2010. Calibration estimation using exponential tilting in sample surveys. Survey Methodology, 36, Kullback, S. (1959. Information Theory and Statistics, Wiley, New York. Kullback, S. & Leibler, R. A. (1951. On information and sufficiency. Annals of Mathematical Statistics, 22, Montanari, G. E. (1987. Post-sampling efficient QR-prediction in large-scale surveys. International Statistical Review, 55, Owen, A. B. (1988. Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75, Owen, A. B. (2001. Empirical Likelihood, Chapman & Hall/CRC, New York. Park, S. & Kim, J. K. (2014. Instrumental-variable calibration estimation in survey sampling. Statistica Sinica, 24, Qin, J. & Lawless, J. (1994. Empirical likelihood and general estimating equations. Annals of Statistics, 22, Rao, J. N. K. (1994. Estimating totals and distribution functions using auxiliary information at the estimation stage. Journal of Official Statistics, 10, Rao, J. N. K. (1965. On two simple schemes of unequal probability sampling without replacement. Journal of the Indian Statistical Association, 3, Sampford, M. R. (1967. On sampling without replacement with unequal probabilities of selection. Biometrika, 54, Särndal, C. E. (2007. The calibration approach in survey theory and practice. Survey Methodology, 33, Särndal, C. E., Swensson, B., & Wretman, J. H. (1992. Model-Assisted Survey Sampling, Springer-Verlag, New York. Särndal, C. E. & Wright, R. L. (1984. Cosmetic form of estimators in survey sampling. Scandinavian Journal of Statistics, 11, Tan, Z. (2006. A distributional approach for causal inference using propensity scores. Journal of the American Statistical Association, 101, Tan, Z. (2010. Bounded, efficient and doubly robust estimation with inverse weighting. Biometrika, 97, Tan, Z. (2013. Simple design-efficient calibration estimators for rejective and high-entropy sampling. Biometrika, 100, Wu, C. (2004. Weighted empirical likelihood inference. Statistics & Probability Letters, 66, Wu, C. & Rao, J. N. K. (2006. Pseudo-empirical likelihood ratio confidence intervals for complex surveys. The Canadian Journal of Statistics, 34, Wu, C. & Rao, J. N. K. (2010. Bootstrap procedures for the pseudo empirical likelihood method in sample surveys. Statistics and Probability Letters, 80, Received 9 October 2013 Accepted 12 October 2014

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys The Canadian Journal of Statistics Vol.??, No.?,????, Pages???-??? La revue canadienne de statistique Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys Zhiqiang TAN 1 and Changbao

More information

Calibration estimation in survey sampling

Calibration estimation in survey sampling Calibration estimation in survey sampling Jae Kwang Kim Mingue Park September 8, 2009 Abstract Calibration estimation, where the sampling weights are adjusted to make certain estimators match known population

More information

Simple design-efficient calibration estimators for rejective and high-entropy sampling

Simple design-efficient calibration estimators for rejective and high-entropy sampling Biometrika (202), 99,, pp. 6 C 202 Biometrika Trust Printed in Great Britain Advance Access publication on 3 July 202 Simple design-efficient calibration estimators for rejective and high-entropy sampling

More information

A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions

A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions Y.G. Berger O. De La Riva Torres Abstract We propose a new

More information

Empirical Likelihood Methods

Empirical Likelihood Methods Handbook of Statistics, Volume 29 Sample Surveys: Theory, Methods and Inference Empirical Likelihood Methods J.N.K. Rao and Changbao Wu (February 14, 2008, Final Version) 1 Likelihood-based Approaches

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Empirical likelihood inference for a common mean in the presence of heteroscedasticity

Empirical likelihood inference for a common mean in the presence of heteroscedasticity The Canadian Journal of Statistics 45 Vol. 34, No. 1, 2006, Pages 45 59 La revue canadienne de statistique Empirical likelihood inference for a common mean in the presence of heteroscedasticity Min TSAO

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

Pseudo-empirical likelihood ratio confidence intervals for complex surveys

Pseudo-empirical likelihood ratio confidence intervals for complex surveys The Canadian Journal of Statistics 359 Vol. 34, No. 3, 2006, Pages 359 375 La revue canadienne de statistique Pseudo-empirical likelihood ratio confidence intervals for complex surveys Changbao WU and

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson Department of Statistics Stockholm University Introduction

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Empirical Likelihood Inference for Two-Sample Problems

Empirical Likelihood Inference for Two-Sample Problems Empirical Likelihood Inference for Two-Sample Problems by Ying Yan A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Statistics

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-13-244R2 Title Examining some aspects of balanced sampling in surveys Manuscript ID SS-13-244R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.2013.244 Complete

More information

Weight calibration and the survey bootstrap

Weight calibration and the survey bootstrap Weight and the survey Department of Statistics University of Missouri-Columbia March 7, 2011 Motivating questions 1 Why are the large scale samples always so complex? 2 Why do I need to use weights? 3

More information

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Melike Oguz-Alper Yves G. Berger Abstract The data used in social, behavioural, health or biological

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st )

More information

Empirical Likelihood Methods for Pretest-Posttest Studies

Empirical Likelihood Methods for Pretest-Posttest Studies Empirical Likelihood Methods for Pretest-Posttest Studies by Min Chen A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in

More information

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MODEL-CALIBRATED PSEUDO EMPIRICAL LIKELIHOOD METHOD

ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MODEL-CALIBRATED PSEUDO EMPIRICAL LIKELIHOOD METHOD Statistica Sinica 12(2002), 1223-1239 ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MOD-CALIBRATED PSEUDO EMPIRICAL LIKIHOOD METHOD Jiahua Chen and Changbao Wu University of Waterloo Abstract:

More information

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/2/03) Ed Stanek Here are comments on the Draft Manuscript. They are all suggestions that

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Estimation of change in a rotation panel design

Estimation of change in a rotation panel design Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS028) p.4520 Estimation of change in a rotation panel design Andersson, Claes Statistics Sweden S-701 89 Örebro, Sweden

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Adjusted Empirical Likelihood for Long-memory Time Series Models

Adjusted Empirical Likelihood for Long-memory Time Series Models Adjusted Empirical Likelihood for Long-memory Time Series Models arxiv:1604.06170v1 [stat.me] 21 Apr 2016 Ramadha D. Piyadi Gamage, Wei Ning and Arjun K. Gupta Department of Mathematics and Statistics

More information

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total. Abstract

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total.   Abstract NONLINEAR CALIBRATION 1 Alesandras Pliusas 1 Statistics Lithuania, Institute of Mathematics and Informatics, Lithuania e-mail: Pliusas@tl.mii.lt Abstract The definition of a calibrated estimator of the

More information

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

The R package sampling, a software tool for training in official statistics and survey sampling

The R package sampling, a software tool for training in official statistics and survey sampling The R package sampling, a software tool for training in official statistics and survey sampling Yves Tillé 1 and Alina Matei 2 1 Institute of Statistics, University of Neuchâtel, Switzerland yves.tille@unine.ch

More information

Optimal Calibration Estimators Under Two-Phase Sampling

Optimal Calibration Estimators Under Two-Phase Sampling Journal of Of cial Statistics, Vol. 19, No. 2, 2003, pp. 119±131 Optimal Calibration Estimators Under Two-Phase Sampling Changbao Wu 1 and Ying Luan 2 Optimal calibration estimators require in general

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson 1 Introduction When planning the sampling strategy (i.e.

More information

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Statistica Sinica 22 (2012), 777-794 doi:http://dx.doi.org/10.5705/ss.2010.238 BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Desislava Nedyalova and Yves Tillé University of

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

A JACKKNIFE VARIANCE ESTIMATOR FOR SELF-WEIGHTED TWO-STAGE SAMPLES

A JACKKNIFE VARIANCE ESTIMATOR FOR SELF-WEIGHTED TWO-STAGE SAMPLES Statistica Sinica 23 (2013), 595-613 doi:http://dx.doi.org/10.5705/ss.2011.263 A JACKKNFE VARANCE ESTMATOR FOR SELF-WEGHTED TWO-STAGE SAMPLES Emilio L. Escobar and Yves G. Berger TAM and University of

More information

A new lack-of-fit test for quantile regression models using logistic regression

A new lack-of-fit test for quantile regression models using logistic regression A new lack-of-fit test for quantile regression models using logistic regression Mercedes Conde-Amboage 1 & Valentin Patilea 2 & César Sánchez-Sellero 1 1 Department of Statistics and O.R.,University of

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Admissible Estimation of a Finite Population Total under PPS Sampling

Admissible Estimation of a Finite Population Total under PPS Sampling Research Journal of Mathematical and Statistical Sciences E-ISSN 2320-6047 Admissible Estimation of a Finite Population Total under PPS Sampling Abstract P.A. Patel 1* and Shradha Bhatt 2 1 Department

More information

A new resampling method for sampling designs without replacement: the doubled half bootstrap

A new resampling method for sampling designs without replacement: the doubled half bootstrap 1 Published in Computational Statistics 29, issue 5, 1345-1363, 2014 which should be used for any reference to this work A new resampling method for sampling designs without replacement: the doubled half

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

Test for Discontinuities in Nonparametric Regression

Test for Discontinuities in Nonparametric Regression Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

The Effective Use of Complete Auxiliary Information From Survey Data

The Effective Use of Complete Auxiliary Information From Survey Data The Effective Use of Complete Auxiliary Information From Survey Data by Changbao Wu B.S., Anhui Laodong University, China, 1982 M.S. Diploma, East China Normal University, 1986 a thesis submitted in partial

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,

More information

Binary choice 3.3 Maximum likelihood estimation

Binary choice 3.3 Maximum likelihood estimation Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation We explain here the various outputs from the maximum likelihood estimation procedure. Solution of the maximum likelihood

More information

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore AN EMPIRICAL LIKELIHOOD BASED ESTIMATOR FOR RESPONDENT DRIVEN SAMPLED DATA Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore Mark Handcock, Department

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful? Journal of Modern Applied Statistical Methods Volume 10 Issue Article 13 11-1-011 Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

More information

Propensity score adjusted method for missing data

Propensity score adjusted method for missing data Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

An Information Criteria for Order-restricted Inference

An Information Criteria for Order-restricted Inference An Information Criteria for Order-restricted Inference Nan Lin a, Tianqing Liu 2,b, and Baoxue Zhang,2,b a Department of Mathematics, Washington University in Saint Louis, Saint Louis, MO 633, U.S.A. b

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring A. Ganguly, S. Mitra, D. Samanta, D. Kundu,2 Abstract Epstein [9] introduced the Type-I hybrid censoring scheme

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

Bootstrap and Parametric Inference: Successes and Challenges

Bootstrap and Parametric Inference: Successes and Challenges Bootstrap and Parametric Inference: Successes and Challenges G. Alastair Young Department of Mathematics Imperial College London Newton Institute, January 2008 Overview Overview Review key aspects of frequentist

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Non-Parametric Bootstrap Mean. Squared Error Estimation For M- Quantile Estimators Of Small Area. Averages, Quantiles And Poverty

Non-Parametric Bootstrap Mean. Squared Error Estimation For M- Quantile Estimators Of Small Area. Averages, Quantiles And Poverty Working Paper M11/02 Methodology Non-Parametric Bootstrap Mean Squared Error Estimation For M- Quantile Estimators Of Small Area Averages, Quantiles And Poverty Indicators Stefano Marchetti, Nikos Tzavidis,

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Introduction to Survey Data Integration

Introduction to Survey Data Integration Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5

More information

Jackknife Empirical Likelihood for the Variance in the Linear Regression Model

Jackknife Empirical Likelihood for the Variance in the Linear Regression Model Georgia State University ScholarWorks @ Georgia State University Mathematics Theses Department of Mathematics and Statistics Summer 7-25-2013 Jackknife Empirical Likelihood for the Variance in the Linear

More information

Nonparametric Tests for Multi-parameter M-estimators

Nonparametric Tests for Multi-parameter M-estimators Nonparametric Tests for Multi-parameter M-estimators John Robinson School of Mathematics and Statistics University of Sydney The talk is based on joint work with John Kolassa. It follows from work over

More information

Minimax design criterion for fractional factorial designs

Minimax design criterion for fractional factorial designs Ann Inst Stat Math 205 67:673 685 DOI 0.007/s0463-04-0470-0 Minimax design criterion for fractional factorial designs Yue Yin Julie Zhou Received: 2 November 203 / Revised: 5 March 204 / Published online:

More information

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract Far East J. Theo. Stat. 0() (006), 179-196 COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS Department of Statistics University of Manitoba Winnipeg, Manitoba, Canada R3T

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

F. Jay Breidt Colorado State University

F. Jay Breidt Colorado State University Model-assisted survey regression estimation with the lasso 1 F. Jay Breidt Colorado State University Opening Workshop on Computational Methods in Social Sciences SAMSI August 2013 This research was supported

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Design and Estimation for Split Questionnaire Surveys

Design and Estimation for Split Questionnaire Surveys University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2008 Design and Estimation for Split Questionnaire

More information