Optimal Calibration Estimators Under Two-Phase Sampling

Size: px
Start display at page:

Download "Optimal Calibration Estimators Under Two-Phase Sampling"

Transcription

1 Journal of Of cial Statistics, Vol. 19, No. 2, 2003, pp. 119±131 Optimal Calibration Estimators Under Two-Phase Sampling Changbao Wu 1 and Ying Luan 2 Optimal calibration estimators require in general complete auxiliary information. When such information is not available, the estimation procedure can be combined with two-phase sampling where a large, less costly rst-phase sample measured over the auxiliary variables is used to get estimates for certain population quantities related to the covariates. In this article we propose optimal calibration estimators for the population mean, the distribution function, the population variance and other second-order nite population quantities under two-phase sampling. The proposed optimal calibration estimators for a second-order nite population quantity such as the population variance can ideally be used to obtain more ef cient variance estimators for a rst-order nite population quantity such as the total or the distribution function. The estimation strategy can be applied to various measurement error and nonresponse problems. The design-based nite sample performances of proposed estimators are investigated through a simulation study using real survey data from the 1996 Statistics Canada Family Expenditure FAME) Survey. Key words: Auxiliary information; measurement error; model-assisted approach; nonresponse; variance estimation. 1. Introduction Many highly ef cient estimation techniques in survey sampling require strong information about auxiliary variables, x. For example, the calibration estimator of Deville and SaÈrndal 1992) requires, the population totals. When such information is not available, a twophase sampling scheme can be used where a large, less costly rst-phase sample measured over the x variable is used to obtain a good estimate of. A smaller and more costly second-phase sample can then be taken and the study variable y is observed. The major advantage of using two-phase sampling is the gain in high precision without substantial increase in cost. SaÈrndal et al. 1992, Chapter 9) provide an excellent account of two-phase sampling. Recently, Wu and Sitter 2001) proposed a model-calibration approach to using auxiliary information from surveys. The proposed model-calibration estimator is not only highly ef cient but also optimal among a class of calibration estimators Wu 1 Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada. cbwu@uwaterloo.ca 2 Department of Statistics, University of California, Riverside, CA 92521, USA. Acknowledgments: This research was supported by a grant fromthe Natural Sciences and Engineering Research Council of Canada. The authors thank Professor Randy R. Sitter for helpful comments. Thanks are also due to the Associate Editor and three anonymous referees for constructive suggestions and comments that greatly improved the presentation. q Statistics Sweden

2 120 Journal of Of cial Statistics 2002). To compute the model-calibration estimator, however, one generally requires the values of the x variables to be known for the entire nite population. In practice this complete auxiliary information is often unavailable. Two-phase sampling provides an ideal solution for the use of optimal calibration estimators under such situations. Suppose U ˆf1; 2;...; Ng is the set of labels for the nite population and s is the set of labels for the sampled units. Let y i and x i be the values of the response variable y and the vector of covariates x associated with the i th unit. Let p i be the rst-order inclusion probabilities. Assuming the population totals ˆ P N i ˆ 1 x i are known, the conventional calibration estimator Deville and SaÈrndal 1992) for the nite population total Y ˆ P N i ˆ 1 y i is constructed as Y à C ˆ P w i y i, where the weights w i minimize a distance measure F s between the w i 's and the basic design weights d i ˆ 1=p i subject to the so-called benchmark constraints w i x i ˆ 1 The x i used in 1) are also referred to as calibration variables. There are two basic components in the construction of a calibration estimator: a distance measure F s and a set of constraints such as 1). The chi-squared distance measure F s ˆ P w i d i 2 = d i q i is commonly used, where the weighting factors q i are unrelated to d i. Other distance measures can also be considered but it has been shown by Deville and SaÈrndal 1992) that the resulting calibration estimator is asymptotically equivalent to the one using a chi-squared distance measure with a certain choice of q i. The benchmark constraints 1) which calibrate directly over the individual x variables are indeed ad hoc. There is no compelling reason to exclude the use of other types of constraints. Optimal calibration estimators have been studied by Wu 2002) under a model-assisted framework. The following semi-parametric superpopulation model was used to motivate the optimality considerations, E y y i x i ˆm x i ; v ; V y y i x i ˆ v x i Š 2 j 2 2 where m ; and v are known functions, v and j 2 are unknown model parameters, and E y, V y denote the expectation and the variance under the model, y. It was also assumed that y 1 ; y 2 ;...; y N are conditionally independent given the x i 's. Wu 2002) considered the class of calibration estimators for the population total Y obtained by using a) any chi-squared distance measure with the weighting factors satisfying q i > q for some constant q > 0andN 1 P N i ˆ 1 qi 2 ˆ O 1 ; b) any constraint through a dimension-reduction variable u i ˆ u x i ; v, i.e., w i u x i ; v ˆN u x i ; v 3 i ˆ 1 where the functional form of u ; can be arbitrary. Some of the major results can be summarized as follows: i) For any consistent estimator of v such that v à ˆ v o p 1, replacing v by v à in the constraint 3) will not change the resulting calibration estimator asymptotically. ii) Let v à P ˆ d i q i x i x T i 1 P d i q i x i y i. If we use u i ˆ xi T Ãv as calibration

3 Wu and Luan: Optimal Calibration Estimators Under Two-Phase Sampling 121 variable in 3), where T denotes transpose, the resulting calibration estimator is identical to the conventional calibration estimator à Y C using 1). Hence, the class of calibration estimators considered by Wu 2002) is very general and includes the conventional calibration estimator as a special member. iii) The model-calibration estimator à ÅY MC of Wu and Sitter 2001) obtained by using u i ˆ E y y i x i ˆm x i ; v in 3) is optimal under the criterion of minimum model expectation of the asymptotic design-based variance, E y fav p à Y g. Here AV p refers to the asymptotic variance under the sampling design, p, and à Y is any calibration estimator from the class. iv) For the estimation of the population distribution function N 1 F Y t ˆN I y i # t i ˆ 1 where I denotes the indicator function, the optimal calibration variable is given by g x i ; t ˆE y I y i # t jx i ŠˆP y i # t x i, which is dependent on the particular value of t. v) To estimate a second-order nite population quantity such as the population variance or more generally Q ˆ P N P Nj i ˆ 1 ˆ i 1 f y i ; y j, the optimal calibration variable is u ij ˆ E y f y i ; y j x i ; x j Š. For proofs, the required regularity conditions and more discussion we refer to Wu 2002) which is available online at It is clear that optimal calibration estimation requires in general the complete auxiliary information x 1 ; x 2 ;...; x N. In this article we focus on the use of optimal calibration estimators under two-phase sampling assuming that the complete auxiliary information is not available but a large rst-phase sample on the x variables can be easily collected with affordable cost. Estimation of the population mean or total using a regression-type estimator under twophase sampling has been discussed in the literature. Less attention, however, has been given to the estimation of the distribution function, the variance or other second-order nite population quantities under the context of two-phase sampling. In Section 2, we present optimal calibration estimators for various rst- and second-order nite population quantities under two-phase sampling under a uni ed framework. In Section 3, the optimal calibration estimators for second-order nite population quantities are naturally used to construct more ef cient variance estimators for the regression estimator for the population mean. Variance estimation for the distribution function is addressed in Section 4. Some empirical results regarding the nite sample design-based performances of the proposed estimators are reported in Section 5 using real survey data of the 1996 Statistics Canada Family Expenditure FAME) Survey for the province of Ontario. Applications to measurement error and nonresponse problems are brie y discussed in Section 6. Some concluding remarks are given in Section Optimal Calibration Estimators Under Two-Phase Sampling For simplicity of presentation, we consider cases where a relatively large rst-phase sample s 0 of xed size n 0 is selected using simple random sampling without replacement,

4 122 Journal of Of cial Statistics and x i is measured for all 0. A second-phase sample s of size n is selected using a general sampling design p s 0 with rst- and second-order inclusion probabilities p i s 0 and p ij s 0, and the response variable y i is observed for. The estimators presented below, however, can be extended to cases where the rst-phase sampling design is also arbitrary. Let d i s 0 ˆ 1=p i s 0 and d i ˆ N=n 0 d i s Estimating the population total Under Model 2), the model-calibration estimator for the population total Y is de ned as ÃY MC ˆ P w i y i, where the calibrated weights w i minimize F s ˆ w i d i 2 = d i q i subject to w i m x i ; v ˆ N=n à 0 m x i ; v à 0 Note that the right-hand side of 4) is an estimate for P N i ˆ 1 m x i ; v à based on the rstphase sample s 0. It is straightforward to show that " ÃY MC ˆ N n 0 d i s 0 y i m x i ; v à # ) d i s 0 m x i ; v à ÃB Y 0 where B à Y ˆ P d i s 0 q i Ãm i y i = P d i s 0 q i Ãm i 2 and Ãm i ˆ m x i ; v. à It can be shown that, under some mild regularity conditions, YMC à is asymptotically equivalent to " YMC ˆ N n 0 d i j s 0 y i m x i ; v N # ) d i j s 0 m x i ; v N B Y 0 where B Y ˆ P N i ˆ 1 q i m i y i = P N i ˆ 1 q i mi 2, m i ˆ m x i ; v N,andv N are the nite population parameters estimated by v. à See Wu and Sitter 2001) for a detailed discussion on the estimation of model parameters. The required regularity conditions were detailed in an unpublished master's essay at the University of Waterloo Luan 2001). It follows that AV p Y à MC ˆV p YMC ˆN 2 1 n 0 1 N S 2 Y N n 0 " # 2 E 1 V 2 d i j s 0 y i B Y m i where E 1 and V 2 denote the expectation and the variance under the rst-phase and the second-phase sampling design, respectively, and S 2 Y is the nite population variance for the y variable. Since the value of S 2 Y is unaffected by different choices of the calibration variable u i and E y E 1 V 2 ŠˆE 1 fe y V 2 g Š, following the same argument as in Theorem 1 of Wu 2002), we can show that à YMC minimizes E y AV p à Y Š among the class of calibration estimators à Y similarly de ned as in Section 1 with 3) modi ed for two-phase sampling in the form of 4) Estimating the distribution function The model-calibrated pseudo-empirical maximum likelihood estimator ME), which is asymptotically equivalent to the optimal calibration estimator Wu and Sitter 2001), is 4

5 Wu and Luan: Optimal Calibration Estimators Under Two-Phase Sampling 123 particularly appealing for the estimation of the distribution function. Under two-phase sampling, the ME estimator for F Y t is de ned as F à ME t ˆP Ãp i I y i # t where the weights Ãp i maximize the pseudo-empirical log-likelihood function Ãl p ˆN n 0 d i j s 0 log p i 5 subject to p i ˆ 1 0 < p i < 1 and p i g x i ; t ˆ 1 n 0 g x i ; t 6 0 where g x i ; t ˆE y I y i # t jx i ŠˆP y i # tjx i. Under a regression working model y i ˆ m x i ; v v x i «i ; i ˆ 1; 2;...; N 7 where the «i 's are independent and identically distributed iid) as N 0; j 2,wehave g x i ; t ˆG fy m x i ; v g=fv x i jgš The G is the cumulative distribution function of N 0; 1. In applications the unknown model parameters v and j 2 will have to be replaced by sample-based estimates. Note that if a prespeci ed t ˆ t 0 is used in 6) while the resulting weights Ãp i are used in à F ME t for an arbitrary t, à FME t will itself be a genuine distribution function. The optimality of ÃF ME t at t ˆ t 0 can be established along the lines of Theorem 2 in Wu 2002). When the regression model 7) is not desirable, a slightly less ef cient but more robust logistic regression model can be used to obtain g x i ; t. See Chen and Wu 2002) or Wu 2002) for details. A simple algorithm for computing the pseudo-empirical likelihood weights Ãp i can be found in Chen et al. 2002) Estimating second-order nite population quantities The population variance, the variance of a linear estimator or more generally a second-order nite population quantity of the form Q ˆ P N P Nj i ˆ 1 ˆ i 1 f y i ; y j can also be ef ciently estimated through optimal calibration. Under two-phase sampling, the optimal model-calibration estimator for Q is given by QMC à ˆ P P j > i w ij f y i ; y j, where the weights w ij minimize the modi ed distance measure F s ˆ P P j > i w ij d ij 2 = d ij q ij subject to 1 w ij E y f y i ; y j x i ; x j ŠˆN N n 0 n 0 E 1 y f y i ; y j x i ; x j Š 8 j > i Here d ij ˆ d ij s 0 N N 1 Š= n 0 n 0 1 Š and d ij s 0 ˆ 1=p ij s 0. The weighting factors q ij in the distance measure are prespeci ed and a common choice would be q ij ˆ 1. This optimal model-calibration estimator QMC à can be explicitly expressed as a regressiontype estimator, as detailed below for the population variance SY 2. Consider the nite population variance SY 2 ˆ N 1 1 P N i ˆ 1 y i Y 2 which can be alternatively expressed as SY 2 ˆ N N 1 Š 1 P N P Nj i ˆ 1 ˆ i 1 y i y j 2. Under a linear regression working model y i ˆ x T i v v x i «i ; i ˆ 1; 2;...; N 9 0 j > i

6 124 Journal of Of cial Statistics where the «i 's are iid with mean zero and variance j 2,wehave E y y i y j 2 x i ; x j Šˆv T x i x j x i x j T v j 2 v 2 x i v 2 x j Š Let u ij ˆ Ãv T x i x j x i x j T v à Ãj 2 v 2 x i v 2 x j Š and v ij ˆ y i y j 2.Itcanbe shown that the resulting optimal model-calibration estimator for SY 2 is given by " ÃS MC 2 1 ˆ n 0 n 0 d 1 ij s 0 v ij u ij ) # d ij s 0 u ij ÃB S j > i 0 j > i j > i where B à S ˆ P P j > i d ij s 0 q ij u ij v ij = P P j > i d ij s 0 q ij uij 2. For a general second-order nite population quantity such as Q, one needs to use v ij ˆ f y i ; y j, u ij ˆ E y f y i ; y j x i ; x j Š, and replace 1= n 0 n 0 1 Š by N N 1 = n 0 n 0 1 Š in the above formulation. If simple random sampling without replacement SRSWOR) is also used at the second phase, S à 2 MC reduces to "!# v 2 x i 1 v 2 x n i ÃB S 0 ÃS 2 MC ˆ s 2 Y à v T s 0 2 s 2 à v Ãj 2 1 n 0 where s 2 Y is the usual sample variance based on y i,, ands 0 2 and s 2 are the sample variance-covariance matrices for the x variable based on s 0 and s, respectively. If further the regression model 9) has a homogeneous variance structure, i.e., v x i ; 1, we have ÃS 2 MC ˆ s 2 Y à v T s 0 2 s 2 à v à B S More Ef cient Variance Estimators for the Regression Estimator Under the commonly used regression model 9), it can be shown that B à Y ˆ 1 and the optimal model-calibration estimator for the population mean ÅY ˆ N 1 P N i ˆ 1 y i is identical to the conventional regression estimator under two-phase sampling Wu and Sitter 2001): ÃÅY MC ˆ 1 n 0 " d i j s 0 y i x i! T # d i s 0 x i v à 0 where v à are the estimated regression coef cients. If SRSWOR is used at the second phase, the approximate design-based variance of à YMC Å is given by V p à YMC ŠLj 1 n 0 1 SY 2 1 N n 1 n 0 SE 2 where SY 2 is de ned as before, SE 2 is the nite population variance de ned over the residual variable e i ˆ y i xi T v N,andv N is the nite population regression coef cients. The conventional variance estimator for à YMC Å is obtained if one replaces SY 2 by sy 2 and SE 2 by se, 2 the sample variances based on the second-phase sample s, and using Ãe i ˆ y i xi T Ãv. See for example Cochran 1977, p. 343). We denote this basic variance estimator by v 0 à ÅY MC. An improved variance estimator which makes more complete use of the large rstphase sample measured over the covariates x can be easily developed as follows. The SY 2 can be more ef ciently estimated by S à MC 2 given by 10). As for SE 2, the optimal model-calibration estimator can be similarly constructed by noting that E y e i ˆ0and

7 Wu and Luan: Optimal Calibration Estimators Under Two-Phase Sampling 125 E y e i e j 2 x i ; x j Š Lj j 2 v 2 x i v 2 x j Š. The resulting estimator for SE 2 is given by " # ÃS E 2 ˆ se 2 1 n 0 v 2 x i 1 v 2 x n i ÃB E 0 where B à E is similarly de ned as B à S but using v ij ˆ Ãe i Ãe j 2 and u ij ˆ v 2 x i v 2 x j.it is interesting to notice that, if v x i ; 1, S à 2 E ˆ se, 2 the large rst-phase sample on x contains no extra information to improve the estimation of SE. 2 See Wu 2002) for more discussion on this issue. Our proposed variance estimator is as follows: v 1 à YMC Å ˆ 1 n 0 1 sy 2 1 N n 1 n 0 se 2 1 n 0 1 Ãv T 0 s 2 N s 2 v à B à S " # 1 n 0 v 2 x i 1 v 2 1 x n i n 0 1 ÃB N S 1 n 1 ÃB n 0 E Ãj 2 0 If v x i ; 1, the very last term in v 1 à YMC Å vanishes, and in this case v 1 à YMC Å reduces to v 1 Åy lr proposed by Sitter 1997) if we also replace B à S by 1. Note that, under the model 9), E y B à S Lj 1, E y B à E Lj 1, an alternative variance estimator is obtained if we replace both B à S and B à E by 1. This is given by v 2 à YMC Å ˆ 1 n 0 1 sy 2 1 N n 1 n 0 se 2 1 n 0 1 Ãv T 0 s 2 N s 2 v à 1 n 1 " # 1 N n 0 v 2 x i 1 v 2 x n i Ãj 2 0 Both v 1 and v 2 are design-consistent. The design-based nite sample performances of v 1 and v 2 are further investigated in Section 5 through a simulation study. Also included in the simulation study is the delete-1 jackknife variance estimator, v J. The detailed formulation of v J was given by Sitter 1997). 4. Variance Estimation for the Distribution Function 4.1. Analytical variance estimators Analytical variance estimators for F à ME t which make more ef cient use of the rst-phase sample can be developed in the same spirit to v 1 or v 2. Following the arguments of Chen and Wu 2002), it can be shown that ÃF ME t ˆ 1 Ãn 0 d i s 0 I y i # t " # 1 n 0 g x i ; t 1 Ãn 0 d i s 0 g x i ; t B N o p 0 1 p n where Ãn 0 ˆ P d i s 0, B N ˆ P N i ˆ 1 g x i ; t g N ŠI y i # t = P N i ˆ 1 g x i ; t g N Š 2,and g N ˆ N 1 P N i ˆ 1 g x i ; t. Under two-phase sampling with SRSWOR at both phases,! 11

8 126 Journal of Of cial Statistics we have V p F à ME t Š Lj 1 n 0 1 SI 2 1 N n 1 n 0 SD 2 where SI 2 and SD 2 are the nite population variances de ned over the variable I i ˆ I y i # t and D i ˆ I y i # t g x i ; t B N, respectively. The usual substitution estimator for V p F à ME t Š is denoted by v 0 F à ME t Š, withsi 2 replaced by si 2 and SD 2 replaced by sd, 2 the sample variances based on s. A more ef cient variance estimator is readily available if we estimate both SI 2 and SD 2 using the general method described in Section 2.3. The resulting estimators S à I 2 and S à D 2 can both be expressed as " # ÃS 2 ˆ s n 0 n 0 u 1 ij u n n 1 ij ÃB F 0 j > i P j > i u ij v ij = P j > i P j > i u 2 ij and s 2 is the corresponding sample var- where B à F ˆ P iance based on s. ForSI 2, v ij ˆ I i I j 2, u ij ˆ E y I i I j 2 x i ; x j Šˆg i g j 2g i g j, and g i ˆ g x i ; t ; for SD, 2 v ij ˆ D i D j 2, u ij ˆ E y D i D j 2 x i ; x j Š Lj g i 1 g i g j 1 g j. As usual, any unknown model parameters appearing in D i or u ij will be replaced by appropriate sample-based estimates. Let v 1 F à ME t Š be the estimator of V p F à ME t Š obtained by using S à I 2 and S à D 2 in place of SI 2 and SD, 2 respectively. Note that SI 2 ˆ N N 1 1 F Y t 1 F Y t Š, an alternative estimator for SI 2 could simply be N N 1 1 FME à t 1 F à ME t Š. We denote the resulting estimator of V p F à ME t Š as v 2 F à ME t Š, where SD 2 is estimated by S à D,asinv 2 1 F à ME t Š. Similar to the mean case, both v 1 and v 2 are design-consistent estimators of V p F à ME t Š Jackknife variance estimator The pseudo-empirical maximum likelihood estimator F à ME t is a nonlinear estimator and its exact design-based variance does not have a closed form. The conventional delete-1 jackknife variance estimator often provides an attractive alternative in such cases. Under two-phase sampling, the delete-1 estimator F à ME jš needs to be computed differently for j [ s and for j [ s 0 s. Let s j denote the set s with the j th unit deleted. Let FME à jš ˆP j Ãp i I y i # t if j [ s, where the weights Ãp i i Þ j maximize l à p ˆP j d i s 0 log p i subject to P j p i ˆ 1 0 < p i < 1 and P j p i g x i ; t ˆ n P 0 j g x i ; t ; let FME à jš ˆP Ãp i I y i # t if j [ s 0 s, where the weights Ãp i maximize l à p ˆP d i s 0 log p i subject to P p i ˆ 1 0 < p i < 1 and P p i g x i ; t ˆ n P 0 j g x i ; t. The usual jackknife variance estimator is v J F à ME t Š ˆ n 0 1 n 0 ff à ME jš F à ME t g 2 12 j [ s 0 This variance estimator is consistent if SRSWOR is used at both phases and the rst-phase sampling fraction n 0 =N is negligible, since F à ME t is asymptotically equivalent to a regression type estimator given by 11) Sitter 1997). If the rst-phase sampling fraction cannot be ignored, an ad hoc adjustment can be made by multiplying v J by 1 n 0 =N. This has been done in the simulation study.

9 Wu and Luan: Optimal Calibration Estimators Under Two-Phase Sampling 127 The major advantage of using v J is the operational convenience. For parameters in a general form of W ˆ WfF Y t g, the jackknife variance estimator for à W ˆ Wf à F ME t g can be readily formulated by replacing à F ME jš and à F ME t in 12)by à W jš ˆWf à F ME jšg and ÃW ˆ Wf à F ME t g. This is most useful when the method is applied to in nite populations where parameters of interest are often in the form of v ˆ v F. 5. Some Empirical Results In this section we report some simulation results regarding the design-based nite sample performances of the proposed estimators using real survey data from the 1996 Statistics Canada Family Expenditure FAME)Survey for the province of Ontario. The data set contains 2,396 observations over a variety of variables. In the simulation, the variable y, total expenditure, was used as the response, and x 1 number of people in the household) and x 2 total income after taxes)were included as auxiliary variables. It was found that a simple linear regression model y ˆ b 0 b 1 x 1 b 2 x 2 «gives a reasonable t to the data, with the nonhomogeneous variance structure V y «ˆx 2 j 2 tting slightly better than the constant variance model. Treating the data set as a xed nite population, we rst selected a two-phase sample with SRSWOR at both phases, and then computed various estimators presented in Sections 2, 3, and 4. For the distribution function F Y t we included results on ve different values of t corresponding to the 10th, 25th, 50th, 75th, and 90th population quantiles. The process was repeated B ˆ 5,000 times for point estimators à ŠYMC, à S 2 MC and à F ME t,and B ˆ 1,000 times for variance estimators v 0, v 1, v 2,andv J. The performance of an estimator à T for a certain population quantity T was evaluated using the relative percentage bias RB%)and the relative ef ciency RE)de ned as RB% ˆ 1 B B b ˆ 1 ÃT b T T 100 and RE ˆ MSE à T 0 MSE à T where à T b was computed from the b th simulated sample, MSE à T ˆB 1 P B b ˆ 1 à T b T 2, and à T 0 denotes the baseline estimator for comparison. For the estimation of ÅY, S 2 Y and F Y t, à T0 is given by Åy ˆ n 1 P y i, s 2 Y ˆ n 1 1 P y i y 2 and ÃF Y t ˆn 1 P I y i # t, respectively. Table 1 reports the simulated RE's for the optimal-calibration or the model-calibrated pseudo-empirical maximum likelihood estimators for the population mean Y, the population variance S 2 Y and the nite population distribution function F Y t at t ˆ t a, a ˆ 0:10, 0.25, 0.50, 0.75, and The rst-phase sample size was chosen as n 0 ˆ 400. The absolute values of the RB's are all less than 2% and are not reported here to save space. Table 1. Relative ef ciency of estimators for ÅY, S 2 Y and F Y t n 0 n ÃÅY MC à S 2 MC à FME t at t ˆ t a

10 128 Journal of Of cial Statistics Table 2. Relative bias in %) of variance estimators for à ÅY MC and ÃF ME t n 0 n v ÃÅY MC à FME t at t ˆ t a v v v v J v v v v J v v v v J The proposed estimators perform uniformly better and are much superior to the conventional estimator à T 0 in all cases. With xed rst-phase sample size, the gain in ef ciency seems larger when the second-phase sample size n is smaller, although such a trend is not monotonic in terms of sample size. As pointed out by one of the referees, this is due to the fact that V p à T! V p à T 0 as n! n 0 for xed n 0. As for the distribution function F Y t, the improvement is more dramatic when t is not in the tail regions. For variance estimators, the true values i.e., T in the de nitions of RB and RE) of V p à ŠYMC and V p à F ME t were estimated from another 5,000 independent simulation runs. The simulated RB's of v 0, v 1, v 2 and v J are reported in Table 2. All RB's are less than 11%, except for those cases relating to the estimation of the variance of ÃF ME 0:10 and à F ME 0:90 when n ˆ 40. The RB's in these cases all take negative values and are smaller than 20% for v 0, v 1 and v 2. The jackknife estimator v J also signi cantly overestimates the variance in these cases. It seems that none of these estimators, including the naive v 0, provides acceptable estimates when n is small and t is in the tail region. The simulated RE's for the variance estimators of à ŠYMC and à F ME t are presented in Table 3. Note that we excluded the RE 's for the case of n ˆ 40 at t ˆ t 0:10 and t 0:90 where Table 3. Relative ef ciency of variance estimators for à ÅY MC and ÃF ME t n 0 n v ÃÅY MC à FME t at t ˆ t a v Ð Ð v Ð Ð v J 0.85 Ð Ð 80 v v v J v v v J

11 Wu and Luan: Optimal Calibration Estimators Under Two-Phase Sampling 129 the bias is not negligible, since comparison in terms of RE under such cases is probably misleading. The variance estimator for baseline comparison is v 0. Table 3 can be summarized as follows: i) v 1 and v 2 perform similar to each other and are uniformly better than v 0 ; ii) the improvement from using v 1 and v 2 over v 0 is only marginal for estimating V p à ÅY MC ; iii) v 1 and v 2 seem more ef cient for estimating V p à F ME t for t at large quantiles; and iv) the jackknife variance estimator, which does not reuse the auxiliary information available at the rst phase, is less stable and less ef cient in most cases. One explanation for ii) is probably the fact that the linear model y ˆ b 0 b 1 x 1 b 2 x 2 «with the nonhomogeneous variance structure V y «ˆx 2 j 2 only provides a slightly better t than the constant variance model. Under such circumstances there is little room to improve the estimation of S 2 E by using auxiliary information through a nonhomogeneous variance structure. The marginal gain comes from the estimation of S 2 Y using à S 2 MC. When the variance structure is clearly nonhomogeneous, higher ef ciency can be expected from using v 1 and v 2. See the theoretical and empirical results reported in Wu 2002) for the case of known complete auxiliary information. 6. Applications to Measurement Error and Nonresponse The proposed optimal calibration estimators under two-phase sampling are applicable to estimation problems under measurement errors or nonresponse. Let y i be the true value of a characteristic of interest for the i th unit in the nite population. Suppose it is dif cult or very expensive to obtain exact measurement but an inaccurate value i.e., measurement with error), say z i,ofy for the i th unit, can be obtained quite easily. In such situations the two-phase sampling technique is often employed. A large rst-phase sample s 0 is taken and the cheap inaccurate measurements z i are collected for all 0. A smaller secondphase sample s Ì s 0 is drawn and the exact measurements y i are collected using more extensive effort. The goal is to estimate various nite population quantities de ned over y i. The relationship between the y i 's and the z i 's can be described by the so-called regression calibration model Carroll et al. 1995), y i ˆ a bz i «i ; i ˆ 1; 2;...; N 13 where the «i 's are assumed independent and identically distributed random variates with E m «i ˆ0 and V m «i ˆj 2, and the subscript m refers to the measurement error model 13). If we treat z i as an auxiliary variable, the methodology developed in Sections 2, 3, and 4 can be applied directly here for estimation under the assumed measurement error model. Item nonresponse is often handled through imputation. If the response variable y is highly correlated with certain covariate s) x and missing values do not occur for the x variables, a direct analysis based on the observed sample data using our proposed estimation strategy may be more desirable. The original sample with complete information on x can be viewed as a rst-phase sample, and the subsample with observed responses on y is treated as a second-phase sample. The loss of information due to the nonresponse can be retreated from auxiliary information through the optimal estimation strategy. The

12 130 Journal of Of cial Statistics inference is conditionally valid for the given sample. An advantage of this approach is that the resulting optimal calibration estimators are design-consistent even if the superpopulation model 2) is misspeci ed. This is in contrast to estimation based on imputed data where the validity of the results depends on the correctness of the model used for imputation. 7. Concluding Remarks The calibration method has gained much popularity since its emergence in the early 90s, and calibration estimators are routinely computed by many survey organizations. The optimal calibration approach provides a uni ed framework for the ef cient estimation of various nite population quantities when complete auxiliary information is available. It has been demonstrated in this article that the method also provides a exible and ef cient way of using auxiliary information at the estimation stage under a two-phase sampling scheme. The proposed optimal calibration estimators for a second-order nite population quantity such as the population variance can perfectly be used to obtain more ef cient variance estimators for a rst-order nite population quantity such as the total or the distribution function. The gain in ef ciency can be appreciable even for a rst-phase sample with moderate size. Applications to measurement error problems or nonresponse are straightforward. The method has the potential to be applied to estimation problems under in nite populations where a two-phase sample is available. 8. References Carroll,R.J.,Ruppert,D.,and Stefanski,L.A. 1995). Measurement Errors in Non-Linear Models. Chapman & Hall. Chen,J. and Wu,C. 2002). Estimation of Distribution Function and Quantiles Using the Model-calibrated Pseudo Empirical Likelihood Method. Statistica Sinica,12, 1223±1239. Chen,J.,Sitter,R.R.,and Wu,C. 2002). Using Empirical Likelihood Method to Obtain Range Restricted Weights in Regression Estimators for Surveys. Biometrika,89, 230±237. Cochran,W.G. 1977). Sampling Techniques 3rd ed.). Wiley,New York. Deville,J.C. and SaÈrndal,C.E. 1992). Calibration Estimators in Survey Sampling. Journal of the American Statistical Association,87,376±382. Luan,Y. 2001). Model-Calibration Approach under Two-Phase Sampling. Unpublished master's essay,department of Statistics and Actuarial Science,University of Waterloo, Canada. Sitter,R.R. 1997). Variance Estimation for the Regression Estimator in Two-Phase Sampling. Journal of the American Statistical Association,92,780±787. SaÈrndal,C.E.,Swensson,B.,and Wretman,J. 1992). Model Assisted Survey Sampling. Springer,New York. Wu,C. 2002). Optimal Calibration Estimators in Survey Sampling. Working paper Department of Statistics and Actuarial Science,University of Waterloo, Canada.

13 Wu and Luan: Optimal Calibration Estimators Under Two-Phase Sampling 131 Wu, C. and Sitter, R.R. 2001). A Model-calibration Approach to Using Complete Auxiliary Information from Survey Data. Journal of the American Statistical Association, 96, 185±193. Received March 2002 Revised November 2002

Bias Correction in the Balanced-half-sample Method if the Number of Sampled Units in Some Strata Is Odd

Bias Correction in the Balanced-half-sample Method if the Number of Sampled Units in Some Strata Is Odd Journal of Of cial Statistics, Vol. 14, No. 2, 1998, pp. 181±188 Bias Correction in the Balanced-half-sample Method if the Number of Sampled Units in Some Strata Is Odd Ger T. Slootbee 1 The balanced-half-sample

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Design and Analysis of Experiments Embedded in Sample Surveys

Design and Analysis of Experiments Embedded in Sample Surveys Journal of Of cial Statistics, Vol. 14, No. 3, 1998, pp. 277±295 Design and Analysis of Experiments Embedded in Sample Surveys Jan van den Brakel and Robbert H. Renssen 1 This article focuses on the design

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Survey Estimation for Highly Skewed Populations in the Presence of Zeroes

Survey Estimation for Highly Skewed Populations in the Presence of Zeroes Journal of Of cial Statistics, Vol. 16, No. 3, 2000, pp. 229±241 Survey Estimation for Highly Skewed Populations in the Presence of Zeroes Forough Karlberg 1 Estimation of the population total of a highly

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MODEL-CALIBRATED PSEUDO EMPIRICAL LIKELIHOOD METHOD

ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MODEL-CALIBRATED PSEUDO EMPIRICAL LIKELIHOOD METHOD Statistica Sinica 12(2002), 1223-1239 ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MOD-CALIBRATED PSEUDO EMPIRICAL LIKIHOOD METHOD Jiahua Chen and Changbao Wu University of Waterloo Abstract:

More information

Calibration as a Standard Method for Treatment of Nonresponse

Calibration as a Standard Method for Treatment of Nonresponse Journal of Of cial Statistics, Vol. 15, No. 2, 1999, pp. 305±327 Calibration as a Standard Method for Treatment of Nonresponse Sixten LundstroÈm 1 and Carl-Erik SaÈrndal 2 This article suggests a simple

More information

Empirical Likelihood Inference for Two-Sample Problems

Empirical Likelihood Inference for Two-Sample Problems Empirical Likelihood Inference for Two-Sample Problems by Ying Yan A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Statistics

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

Empirical Likelihood Methods

Empirical Likelihood Methods Handbook of Statistics, Volume 29 Sample Surveys: Theory, Methods and Inference Empirical Likelihood Methods J.N.K. Rao and Changbao Wu (February 14, 2008, Final Version) 1 Likelihood-based Approaches

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Estimation of a Local-Aggregate Network Model with. Sampled Networks

Estimation of a Local-Aggregate Network Model with. Sampled Networks Estimation of a Local-Aggregate Network Model with Sampled Networks Xiaodong Liu y Department of Economics, University of Colorado, Boulder, CO 80309, USA August, 2012 Abstract This paper considers the

More information

Strati cation in Multivariate Modeling

Strati cation in Multivariate Modeling Strati cation in Multivariate Modeling Tihomir Asparouhov Muthen & Muthen Mplus Web Notes: No. 9 Version 2, December 16, 2004 1 The author is thankful to Bengt Muthen for his guidance, to Linda Muthen

More information

On Efficiency of Midzuno-Sen Strategy under Two-phase Sampling

On Efficiency of Midzuno-Sen Strategy under Two-phase Sampling International Journal of Statistics and Analysis. ISSN 2248-9959 Volume 7, Number 1 (2017), pp. 19-26 Research India Publications http://www.ripublication.com On Efficiency of Midzuno-Sen Strategy under

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Asymptotic Normality under Two-Phase Sampling Designs

Asymptotic Normality under Two-Phase Sampling Designs Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context

More information

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 15: Multiple Linear Regression & Correlation

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 15: Multiple Linear Regression & Correlation Finansiell Statistik, GN, 5 hp, VT28 Lecture 5: Multiple Linear Regression & Correlation Gebrenegus Ghilagaber, PhD, ssociate Professor May 5, 28 Introduction In the simple linear regression Y i = + X

More information

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys The Canadian Journal of Statistics Vol.??, No.?,????, Pages???-??? La revue canadienne de statistique Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys Zhiqiang TAN 1 and Changbao

More information

A Procedure for Strati cation by an Extended Ekman Rule

A Procedure for Strati cation by an Extended Ekman Rule Journal of Of cial Statistics, Vol. 16, No. 1, 2000, pp. 15±29 A Procedure for Strati cation by an Extended Ekman Rule Dan Hedlin 1 This article deals with the Ekman rule, which is a well-known method

More information

Determining Sample Sizes for Surveys with Data Analyzed by Hierarchical Linear Models

Determining Sample Sizes for Surveys with Data Analyzed by Hierarchical Linear Models Journal of Of cial Statistics, Vol. 14, No. 3, 1998, pp. 267±275 Determining Sample Sizes for Surveys with Data Analyzed by Hierarchical Linear Models Michael P. ohen 1 Behavioral and social data commonly

More information

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Statistica Sinica 17(2007), 1047-1064 ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Jiahua Chen and J. N. K. Rao University of British Columbia and Carleton University Abstract: Large sample properties

More information

Minimax design criterion for fractional factorial designs

Minimax design criterion for fractional factorial designs Ann Inst Stat Math 205 67:673 685 DOI 0.007/s0463-04-0470-0 Minimax design criterion for fractional factorial designs Yue Yin Julie Zhou Received: 2 November 203 / Revised: 5 March 204 / Published online:

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Measuring robustness

Measuring robustness Measuring robustness 1 Introduction While in the classical approach to statistics one aims at estimates which have desirable properties at an exactly speci ed model, the aim of robust methods is loosely

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

A Design-based Analysis Procedure for Two-treatment Experiments Embedded in Sample Surveys. An Application in the Dutch Labor Force Survey

A Design-based Analysis Procedure for Two-treatment Experiments Embedded in Sample Surveys. An Application in the Dutch Labor Force Survey Journal of Of cial Statistics, Vol. 18, No. 2, 2002, pp. 217±231 A Design-based Analysis Procedure for Two-treatment Experiments Embedded in Sample Surveys. An Application in the Dutch Labor Force Survey

More information

Imputation for Missing Data under PPSWR Sampling

Imputation for Missing Data under PPSWR Sampling July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR

More information

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods

More information

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation Inference about Clustering and Parametric Assumptions in Covariance Matrix Estimation Mikko Packalen y Tony Wirjanto z 26 November 2010 Abstract Selecting an estimator for the variance covariance matrix

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Statistica Sinica 13(2003), 613-623 TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Hansheng Wang and Jun Shao Peking University and University of Wisconsin Abstract: We consider the estimation

More information

Multinomial Discrete Choice Models

Multinomial Discrete Choice Models hapter 2 Multinomial Discrete hoice Models 2.1 Introduction We present some discrete choice models that are applied to estimate parameters of demand for products that are purchased in discrete quantities.

More information

A Course on Advanced Econometrics

A Course on Advanced Econometrics A Course on Advanced Econometrics Yongmiao Hong The Ernest S. Liu Professor of Economics & International Studies Cornell University Course Introduction: Modern economies are full of uncertainties and risk.

More information

Mean estimation with calibration techniques in presence of missing data

Mean estimation with calibration techniques in presence of missing data Computational Statistics & Data Analysis 50 2006 3263 3277 www.elsevier.com/locate/csda Mean estimation with calibration techniues in presence of missing data M. Rueda a,, S. Martínez b, H. Martínez c,

More information

Jackknife Euclidean Likelihood-Based Inference for Spearman s Rho

Jackknife Euclidean Likelihood-Based Inference for Spearman s Rho Jackknife Euclidean Likelihood-Based Inference for Spearman s Rho M. de Carvalho and F. J. Marques Abstract We discuss jackknife Euclidean likelihood-based inference methods, with a special focus on the

More information

Modi ed Ratio Estimators in Strati ed Random Sampling

Modi ed Ratio Estimators in Strati ed Random Sampling Original Modi ed Ratio Estimators in Strati ed Random Sampling Prayad Sangngam 1*, Sasiprapa Hiriote 2 Received: 18 February 2013 Accepted: 15 June 2013 Abstract This paper considers two modi ed ratio

More information

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction

More information

ECON 594: Lecture #6

ECON 594: Lecture #6 ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2018 1 Limited dependent variables: introduction Up to now, we have been implicitly assuming that the dependent variable, y, was

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Design and Estimation for Split Questionnaire Surveys

Design and Estimation for Split Questionnaire Surveys University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2008 Design and Estimation for Split Questionnaire

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels.

Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels. Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels. Pedro Albarran y Raquel Carrasco z Jesus M. Carro x June 2014 Preliminary and Incomplete Abstract This paper presents and evaluates

More information

Binary choice 3.3 Maximum likelihood estimation

Binary choice 3.3 Maximum likelihood estimation Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation We explain here the various outputs from the maximum likelihood estimation procedure. Solution of the maximum likelihood

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

Multiple-Objective Optimal Designs for the Hierarchical Linear Model

Multiple-Objective Optimal Designs for the Hierarchical Linear Model Journal of Of cial Statistics, Vol. 18, No. 2, 2002, pp. 291±303 Multiple-Objective Optimal Designs for the Hierarchical Linear Model Mirjam Moerbeek 1 and Weng Kee Wong 2 Optimal designs are usually constructed

More information

ASSESSING A VECTOR PARAMETER

ASSESSING A VECTOR PARAMETER SUMMARY ASSESSING A VECTOR PARAMETER By D.A.S. Fraser and N. Reid Department of Statistics, University of Toronto St. George Street, Toronto, Canada M5S 3G3 dfraser@utstat.toronto.edu Some key words. Ancillary;

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

Simple Estimators for Semiparametric Multinomial Choice Models

Simple Estimators for Semiparametric Multinomial Choice Models Simple Estimators for Semiparametric Multinomial Choice Models James L. Powell and Paul A. Ruud University of California, Berkeley March 2008 Preliminary and Incomplete Comments Welcome Abstract This paper

More information

The Effective Use of Complete Auxiliary Information From Survey Data

The Effective Use of Complete Auxiliary Information From Survey Data The Effective Use of Complete Auxiliary Information From Survey Data by Changbao Wu B.S., Anhui Laodong University, China, 1982 M.S. Diploma, East China Normal University, 1986 a thesis submitted in partial

More information

(c) i) In ation (INFL) is regressed on the unemployment rate (UNR):

(c) i) In ation (INFL) is regressed on the unemployment rate (UNR): BRUNEL UNIVERSITY Master of Science Degree examination Test Exam Paper 005-006 EC500: Modelling Financial Decisions and Markets EC5030: Introduction to Quantitative methods Model Answers. COMPULSORY (a)

More information

13 Endogeneity and Nonparametric IV

13 Endogeneity and Nonparametric IV 13 Endogeneity and Nonparametric IV 13.1 Nonparametric Endogeneity A nonparametric IV equation is Y i = g (X i ) + e i (1) E (e i j i ) = 0 In this model, some elements of X i are potentially endogenous,

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Chapter 1. GMM: Basic Concepts

Chapter 1. GMM: Basic Concepts Chapter 1. GMM: Basic Concepts Contents 1 Motivating Examples 1 1.1 Instrumental variable estimator....................... 1 1.2 Estimating parameters in monetary policy rules.............. 2 1.3 Estimating

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Multivariate Regression: Part I

Multivariate Regression: Part I Topic 1 Multivariate Regression: Part I ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà Outline of this topic Statement of the objective: we want to explain the behavior of one variable as a

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor guirregabiria SOLUTION TO FINL EXM Monday, pril 14, 2014. From 9:00am-12:00pm (3 hours) INSTRUCTIONS:

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments

Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments Nonparametric Identi cation of Regression Models Containing a Misclassi ed Dichotomous Regressor Without Instruments Xiaohong Chen Yale University Yingyao Hu y Johns Hopkins University Arthur Lewbel z

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

NUMERICAL DISTRIBUTION FUNCTIONS OF LIKELIHOOD RATIO TESTS FOR COINTEGRATION

NUMERICAL DISTRIBUTION FUNCTIONS OF LIKELIHOOD RATIO TESTS FOR COINTEGRATION JOURNAL OF APPLIED ECONOMETRICS J. Appl. Econ. 14: 563±577 (1999) NUMERICAL DISTRIBUTION FUNCTIONS OF LIKELIHOOD RATIO TESTS FOR COINTEGRATION JAMES G. MACKINNON, a * ALFRED A. HAUG b AND LEO MICHELIS

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

Estimating the Number of Common Factors in Serially Dependent Approximate Factor Models

Estimating the Number of Common Factors in Serially Dependent Approximate Factor Models Estimating the Number of Common Factors in Serially Dependent Approximate Factor Models Ryan Greenaway-McGrevy y Bureau of Economic Analysis Chirok Han Korea University February 7, 202 Donggyu Sul University

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Songklanakarin Journal of Science and Technology SJST R3 LAWSON

Songklanakarin Journal of Science and Technology SJST R3 LAWSON Songklanakarin Journal of Science and Technology SJST-0-00.R LAWSON Ratio Estimators of Population Means Using Quartile Function of Auxiliary Variable in Double Sampling Journal: Songklanakarin Journal

More information

Approximate Inference for the Multinomial Logit Model

Approximate Inference for the Multinomial Logit Model Approximate Inference for the Multinomial Logit Model M.Rekkas Abstract Higher order asymptotic theory is used to derive p-values that achieve superior accuracy compared to the p-values obtained from traditional

More information

Testing for a Trend with Persistent Errors

Testing for a Trend with Persistent Errors Testing for a Trend with Persistent Errors Graham Elliott UCSD August 2017 Abstract We develop new tests for the coe cient on a time trend in a regression of a variable on a constant and time trend where

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris Jackknife Variance Estimation of the Regression and Calibration Estimator for Two 2-Phase Samples Jong-Min Kim* and Jon E. Anderson jongmink@morris.umn.edu Statistics Discipline Division of Science and

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Non-parametric Tests for the Comparison of Point Processes Based on Incomplete Data

Non-parametric Tests for the Comparison of Point Processes Based on Incomplete Data Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA Vol 28: 725±732, 2001 Non-parametric Tests for the Comparison of Point Processes Based

More information

Con dentiality, Uniqueness, and Disclosure Limitation for Categorical Data 1

Con dentiality, Uniqueness, and Disclosure Limitation for Categorical Data 1 Journal of Of cial Statistics, Vol. 14, No. 4, 1998, pp. 385±397 Con dentiality, Uniqueness, and Disclosure Limitation for Categorical Data 1 Stephen E. Fienberg 2 and Udi E. Makov 3 When an agency releases

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Some Statistical Problems in Merging Data Files 1

Some Statistical Problems in Merging Data Files 1 Journal of Of cial Statistics, Vol. 17, No. 3, 001, pp. 43±433 Some Statistical Problems in Merging Data Files 1 Joseph B. Kadane Suppose that two les are given with some overlapping variables some variables

More information

Empirical likelihood inference for a common mean in the presence of heteroscedasticity

Empirical likelihood inference for a common mean in the presence of heteroscedasticity The Canadian Journal of Statistics 45 Vol. 34, No. 1, 2006, Pages 45 59 La revue canadienne de statistique Empirical likelihood inference for a common mean in the presence of heteroscedasticity Min TSAO

More information

AGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0

AGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0 AGEC 661 ote Eleven Ximing Wu M-estimator So far we ve focused on linear models, where the estimators have a closed form solution. If the population model is nonlinear, the estimators often do not have

More information

A Nonparametric Estimator of Species Overlap

A Nonparametric Estimator of Species Overlap A Nonparametric Estimator of Species Overlap Jack C. Yue 1, Murray K. Clayton 2, and Feng-Chang Lin 1 1 Department of Statistics, National Chengchi University, Taipei, Taiwan 11623, R.O.C. and 2 Department

More information

A Note on Auxiliary Particle Filters

A Note on Auxiliary Particle Filters A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Jonathan B. Hill Dept. of Economics University of North Carolina - Chapel Hill November

More information

Parametric Inference on Strong Dependence

Parametric Inference on Strong Dependence Parametric Inference on Strong Dependence Peter M. Robinson London School of Economics Based on joint work with Javier Hualde: Javier Hualde and Peter M. Robinson: Gaussian Pseudo-Maximum Likelihood Estimation

More information

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1 Markov-Switching Models with Endogenous Explanatory Variables by Chang-Jin Kim 1 Dept. of Economics, Korea University and Dept. of Economics, University of Washington First draft: August, 2002 This version:

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Averaging Estimators for Regressions with a Possible Structural Break

Averaging Estimators for Regressions with a Possible Structural Break Averaging Estimators for Regressions with a Possible Structural Break Bruce E. Hansen University of Wisconsin y www.ssc.wisc.edu/~bhansen September 2007 Preliminary Abstract This paper investigates selection

More information

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING Statistica Sinica 13(2003), 641-653 EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING J. K. Kim and R. R. Sitter Hankuk University of Foreign Studies and Simon Fraser University Abstract:

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER By Donald W. K. Andrews August 2011 COWLES FOUNDATION DISCUSSION PAPER NO. 1815 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS

More information

Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution

Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution Journal of Computational and Applied Mathematics 216 (2008) 545 553 www.elsevier.com/locate/cam Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution

More information

Variance Estimation for Calibration to Estimated Control Totals

Variance Estimation for Calibration to Estimated Control Totals Variance Estimation for Calibration to Estimated Control Totals Siyu Qing Coauthor with Michael D. Larsen Associate Professor of Statistics Tuesday, 11/05/2013 2 Outline A. Background B. Calibration Technique

More information

Simplified marginal effects in discrete choice models

Simplified marginal effects in discrete choice models Economics Letters 81 (2003) 321 326 www.elsevier.com/locate/econbase Simplified marginal effects in discrete choice models Soren Anderson a, Richard G. Newell b, * a University of Michigan, Ann Arbor,

More information

Transformation and Smoothing in Sample Survey Data

Transformation and Smoothing in Sample Survey Data Scandinavian Journal of Statistics, Vol. 37: 496 513, 2010 doi: 10.1111/j.1467-9469.2010.00691.x Published by Blackwell Publishing Ltd. Transformation and Smoothing in Sample Survey Data YANYUAN MA Department

More information

5.3 LINEARIZATION METHOD. Linearization Method for a Nonlinear Estimator

5.3 LINEARIZATION METHOD. Linearization Method for a Nonlinear Estimator Linearization Method 141 properties that cover the most common types of complex sampling designs nonlinear estimators Approximative variance estimators can be used for variance estimation of a nonlinear

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris Jackknife Variance Estimation for Two Samples after Imputation under Two-Phase Sampling Jong-Min Kim* and Jon E. Anderson jongmink@mrs.umn.edu Statistics Discipline Division of Science and Mathematics

More information