The Effective Use of Complete Auxiliary Information From Survey Data

Size: px
Start display at page:

Download "The Effective Use of Complete Auxiliary Information From Survey Data"

Transcription

1 The Effective Use of Complete Auxiliary Information From Survey Data by Changbao Wu B.S., Anhui Laodong University, China, 1982 M.S. Diploma, East China Normal University, 1986 a thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Mathematics and Statistics c Changbao Wu 2004 SIMON FRASER UNIVERSITY August, 1999 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without the permission of the author.

2 APPROVAL Name: Degree: Title of thesis: Changbao Wu Doctor of Philosophy The Effective Use of Complete Auxiliary Information From Survey Data Examining Committee: Dr. Katherine Heinrich Chair Dr. Randy R. Sitter Senior Supervisor Dr. Charmaine Dean Dr. Richard Lockhart Dr. Carl Schwarz Dr. Gemai Chen External Examiner Department of Math & Stats University of Regina Date Approved: ii

3 Abstract A unified framework to deal with the effective use of complete auxiliary information from survey data at the estimation stage has been attempted. The proposed method involves modeling the relationship between the variable of interest and the auxiliary variables, and then incorporating the auxiliary information into design-consistent estimators of finite population means, totals, distribution functions and quantiles through the predicted values using calibration and empirical likelihood methods. The proposed model-calibration estimators can effectively handle any linear or non-linear models and the estimators of means and totals reduce to the generalized regression estimators under linear models. The pseudo-empirical likelihood approach (Chen and Sitter, 1999), when used in this setting, gives an estimator that is asymptotically equivalent to the model-calibration estimator but with positive weights, and therefore is preferred. Some existing estimators which use complete auxiliary information are shown to be special cases of this unified approach. The approach also provides a simple and elegant algorithm for obtaining an approximately generalized regression estimator with positive weights. This approach can be termed model-assisted as the resulting estimators are design-consistent regardless of the working model and particularly efficient if the working model adequately describes the true relationship in the population. Variance estimation and confidence intervals are also considered. Consistent analytical and jackknife variance estimators are obtained for estimators of means, totals and distribution functions. Small sample performance of these variance estimators has been investigated through a limited simulation study. Better conditional performance of the jackknife is highlighted. These variance estimates can be used in normal theory iii

4 confidence intervals. In the case of distribution functions, a simple transformation technique for obtaining better performing confidence intervals is proposed. Woodruff confidence intervals for quantiles are re-examined and a surprising property of this interval has been found and confirmed both empirically and theoretically. Finally, on a somewhat independent tack, a purely model-based approach is considered. The main new results in this area involve the development of consistent analytical and jackknife variance estimators for the model-based estimator of the distribution function of Chambers and Dunstan (1986). The analytical variance estimators involve kernel smoothing and require substantial re-derivation for every new model. The jackknife variance estimators, however, are operationally simple and easy to extend to new models. iv

5 Acknowledgments First of all, I wish to express my deepest appreciation and gratitude to my senior supervisor Dr. Randy Sitter for his enthusiasm, encouragement and guidance during the past four years. I thank all the statistics faculty members at SFU who led me into the real world of statistics and sharpened my mind for critical thinking. Special thanks are due to Dr. Charmaine Dean and Dr. Richard Lockhart for their consistent support. I would also like to thank Sylvia, Maggie, Judy, Diane, Casey and other staff members of the department for their kindness and help. Many many thanks to my friends and officemates Derek, Phil, Heidi, Chandanie, Mike, Melody, Carolyn, Darby, Chuck, Jason, Peter, Jerry, Xucai, Hilary and many others, who have always been supportive and encouraging. I thank Dr. Jiahua Chen at University of Waterloo for many valuable discussions. I have benefited very much from various financial support during the course of my studies, especially the C. D. Nelson Memorial Scholarship from Simon Fraser University, Research and Teaching Assistantships from the Department of Mathematics and Statistics, and the E. C. Bryant Scholarship from American Statistical Association and Westat. Inc. Most of all, I wish to thank my wife Jianchuan and my daughter Domeny for coming to Canada and sharing with me these unforgettable years of life at Simon Fraser University. v

6 Dedication To the memory of my mother. To Jianchuan and Domeny. vi

7 Contents Abstract iii Acknowledgments v Dedication vi List of Tables x List of Figures xi 1 Introduction A Review General Settings Estimation of The Finite Population Mean and Total The generalized regression estimator The calibration estimator The pseudo-empirical maximum likelihood estimator Estimation of The Distribution Function The design-based difference estimator The model-based prediction estimator Supplementary remarks A Discussion Model Calibration Modeling Estimation of The Population Mean Model-calibration Pseudo-empirical likelihood approach The generalized difference estimator vii

8 3.2.4 Some comparative comments A simulation Estimating The Distribution Function Estimation of F (t) under a regression model Estimation of F (t) under a general model Quantile Estimation Proofs Positive Weights in Regression Estimation Introduction Model Calibration and the Empirical Likelihood Approach Algorithm for Obtaining Positive Weights Comparing Two Sets of Weights Variance Estimation and Confidence Intervals Variance Estimation for The Finite Population Mean Variance Estimation for The Distribution Function Analytical variance estimation for ˆF d (t) Jackknife variance estimation for ˆF d (t) A simulation Illustration of (5.4) under sampling with replacement Confidence Intervals for the Distribution Function A transformation technique An empirical comparison Confidence Intervals for Quantiles Woodruff intervals for large and small quantiles: a simulation Investigation of the phenomenon Supplementary remarks Model-based Inference Model-based Prediction Estimators viii

9 6.1.1 Model-based estimator of the population mean Model-based estimator for the distribution function Variance Estimation Analytical variance estimation Jackknife variance estimation An empirical comparison of variance estimators Proof of Theorem Concluding Remarks and Future Research Concluding Remarks Some Future Work Bibliography ix

10 List of Tables 3.1 Relative bias and efficiency for estimating the mean Relative efficiency for estimating F (t) Relative bias and efficiency for estimating the quantiles (1) Relative bias and efficiency for estimating the quantiles (2) Relative bias and efficiency for estimating the quantiles (3) Relative bias and instability of variance estimators for ˆF d (t) Coverage probabilities and tail errors for transformation intervals Coverage probabilities and tail errors for Woodruff intervals Coverage probabilities for idealized woodruff intervals Coverage probabilities and tail errors for modified Woodruff intervals Relative bias and instability of the variance estimators for quantiles Relative bias and instability of variance estimators for ˆF m (t) x

11 List of Figures 3.1 Scatter plot of population Graphical representation of g(λ) Plot of R-weight versus P-weight Plot of conditional performance of variance estimators xi

12 Chapter 1 Introduction This thesis considers the use of complete auxiliary information from survey data, where complete means that the values of auxiliary variables are known for the entire finite population, not just for the selected sample, s. In sample surveys, auxiliary information on the finite population is regularly used to obtain estimates for the unknown finite population quantities with higher precision. Sometimes this auxiliary information is used at the design stage. Examples include probability proportional to size sampling where the values of an auxiliary variable are the size measures, or stratified sampling where auxiliary variables serve as stratum indicators. But often this information is incorporated into the construction of estimators at the estimation stage. Ratio and regression estimators are early examples. Recently, several more complex procedures have been proposed in the literature. However, as is discussed in Chapter 2, existing estimators for finite population means and totals essentially incorporate the auxiliary information only through the known population means or totals of the auxiliary variables, even when complete auxiliary information is available. In this thesis, a unified framework to address questions of how to effectively use the complete auxiliary information at the estimation stage is proposed and developed. This framework adopts a general modeling process and incorporates the complete auxiliary information into design-consistent estimators of finite population means, totals, distribution functions and quantiles through the fitted values using calibration and empirical likelihood methods. The logical connection and the practical difference 1

13 CHAPTER 1. INTRODUCTION 2 between estimation of the finite population mean and estimation of the distribution function using complete auxiliary information become more apparent under the unified approach. Variance estimation is also considered. For the mean case it is shown to be quite straightforward, but not for the distribution function. We propose to use a jackknife variance estimator for the distribution function and establish its consistency for some important cases. Our approach in Chapters 3, 4 and 5 can be termed model-assisted as the resulting estimators are design-consistent regardless of the working model and particularly efficient if the working model adequately describes the true relationship in the population. In Chapter 2, we first describe the general setting and notation and then briefly review existing methods that use auxiliary information at the estimation stage. In particular, we discuss the explicit or implicit nature of a linear working model used in these methods and call for more sophisticated modeling in using complete auxiliary information. In Chapter 3, we introduce a unified framework for the use of complete auxiliary information through a general modeling process and using a general approach which we term model-calibration. New estimators for population means, totals, distribution functions and quantiles are proposed and compared under this framework. We assume a superpopulation model by using the first and second moments of the y variable. This model is very general and includes the linear or non-linear regression models and the generalized linear models as special cases. The problem of estimating the model parameters is carefully treated in Section 3.1 following the proposal of Godambe and Thompson (1986) and using the theory of estimating equations. The design-based estimates of the model parameters can then be obtained which will enable us to proceed to our model-assisted approach. We argue in Section 3.2 that complete auxiliary information should be used through fitted values under the working model. A general method to do this is what we term model-calibration. This can be accomplished by first using (y i, x i ) for i s to build the model and then calibrating to the predicted values from the model using: (1) a direct calibration argument similar to that of Deville and Särndal (1992); (2) a

14 CHAPTER 1. INTRODUCTION 3 pseudo-empirical likelihood approach (Chen and Sitter, 1999); or (3) a generalized difference estimator (Cassel, Särndal and Wretman, 1976; Särndal, 1980). The proposed model-calibration estimators for finite population means and totals can effectively handle any linear or non-linear models and reduce to the conventional calibration estimators (Deville and Särndal, 1992) and/or the generalized regression estimators under linear models. The pseudo-empirical maximum likelihood estimator (Chen and Sitter, 1999), when applied in a similar way, is shown to yield an estimator that is asymptotically equivalent to the model-calibration estimator but with positive weights, and is therefore preferred. Estimating the finite population distribution function, F (t), with complete auxiliary information amounts to finding fitted values for indicator variables involving y i and not y i itself. Several estimators are proposed in Section 3.3 and some existing methods are shown to be special cases of this unified framework. Estimators of quantiles are usually obtained by inverting estimators of the distribution function. This can be easily done if the estimator of the distribution function is itself a true distribution function. This is usually not the case when auxiliary variables are incorporated into the estimator. Alternatively, in Section 3.4, we propose to use a difference estimator and a regression-type estimator for the quantiles when complete auxiliary information is available. In Chapter 4, we regress and consider estimation of the mean or total of y when only the vector finite population means of the auxiliary variables, X, are known. The generalized regression estimator (Cassel, Särndal and Wretman, 1976; Särndal, 1980) is one of the most commonly used procedures in this situation. One of the drawbacks of this estimator is that, when it is written as a weighted average of the y i s from the sample, it can result in negative weights which is very unattractive to practitioners. Some algorithms have been proposed in the literature to adjust the socalled regression weights iteratively so that the adjusted estimator is asymptotically equivalent to the generalized regression estimator but with positive weights. See, for instance, Huang and Fuller (1978). As a surprising by-product of our more generally applicable model-calibration approach, we propose a simple and elegant algorithm to obtain positive weights for the generalized regression estimator in this setting by

15 CHAPTER 1. INTRODUCTION 4 using the idea of model-calibration combined with the pseudo-empirical likelihood approach. The algorithm requires no seeds, is not iterative, and guarantees that if a solution exists, it will be obtained. In Chapter 5, we consider issues related to variance estimation and confidence intervals for the general estimation methods proposed in Chapter 3. Variance estimation for the mean case is shown to be simple. Variance estimation for the finite population distribution function in the presence of complete auxiliary information is more difficult, due to the discreteness of the indicator functions used in the estimators. In particular, we must deal with the fact that the model parameters used inside indicator functions in these estimators have themselves been estimated from the sample data. We focus on one important case: the design-based difference estimator of Rao, Kovar and Mantel (1990), which turns out to be one special case yielded by our model-calibration approach. We show that in this case the estimated model parameters do not change the asymptotic design variance of the estimator for some commonly used designs. This result is critical in establishing the consistency of proposed variance estimators. We go on to propose a jackknife variance estimator for the distribution function. The jackknife variance estimator for the design-based difference estimator does not have any great advantage other than operational simplicity in some practical settings, since the analytical variance and its estimator can be developed quite easily. However, while examining the small sample performance of this variance estimator through simulation study, we demonstrate that jackknife performs better conditionally, conditioning on the means of auxiliary variables for a given sample. After a point estimate and its estimated variance have been obtained, confidence intervals can be constructed using the conventional Z statistic and the normal approximation. However, for the distribution function, this interval performs badly for the tail region of the distribution function. In Section 5.3, we propose a simple transformation technique to obtain better behaved confidence intervals for the distribution function.

16 CHAPTER 1. INTRODUCTION 5 The well-known Woodruff confidence intervals for quantiles are obtained by inverting the normal confidence intervals for the distribution function. One might intuitively believe that Woodruff intervals should not be recommended for large or small quantiles using the normal confidence intervals, since in these cases normal intervals perform badly. Surprisingly, we demonstrate that, despite this fact, Woodruff intervals for large or small quantiles based on inverting these badly behaved intervals perform very well. We investigate this both empirically and theoretically in Section 5.4. Finally, on a somewhat independent tack, a purely model-based approach is considered in Chapter 6. The main new results in this area involve the development of consistent analytical and jackknife variance estimators for the model-based estimator of the distribution function of Chambers and Dunstan (1986). Unlike for the designbased difference estimator (Rao, Kovar and Mantel, 1990) of Section 5.2 where we show that the estimated model parameters do not change the asymptotic design variance, the estimated model parameters have to be taken into account for the variance estimation for this model-based prediction estimator. Analytical variance formulas are very difficult to obtain and must be derived one-at-a-time for different assumed models. These variance estimators also involve kernel density estimation. On the other hand, the jackknife variance estimator is easy to compute, remains operationally the same for different superpopulation models, and neatly avoids kernel density estimation. The consistency of the jackknife is established under some very mild regularity conditions in Section 6.2.

17 Chapter 2 The Use of Auxiliary Information From Surveys: A Review In sample surveys, auxiliary information on the finite population is often used at the estimation stage to increase the precision of estimators of the finite population mean, total or distribution function. In the simplest settings, customary ratio and regression estimators incorporate known finite population means of auxiliary variables. For more general situations, there have been three main methods proposed in the literature which can be categorized as model-assisted approaches: the generalized regression estimator (GR) (Cassel, Särndal and Wretman, 1976; Särndal, 1980); calibration estimators (Deville and Särndal, 1992); and more recently empirical likelihood methods (Chen and Qin, 1993; Zhong and Rao, 1998; Chen and Sitter, 1999). Recently, several estimators for the finite population distribution function using auxiliary information have also been proposed. We now briefly review these developments and address some related issues. First, the general setting and notation are described in Section

18 CHAPTER 2. A REVIEW General Settings The finite population Suppose that the finite population consists of N identifiable units. Associated with the i-th unit are, the study variable, y i, and a vector of auxiliary variables, x i. The values x 1, x 2,..., x N are known for the entire population but y i is known only if the i-th unit is selected in the sample, s. Let U = {(y i, x i ) : i = 1, 2,, N} be the set of units for the finite population and s = {(y i, x i ) : i = 1, 2,, n} be the set of units in the sample. Parameters of interest Although the parameters of interest can be formed more generally, throughout this work, we will only consider the estimation problem for the finite population mean Ȳ = N 1 N i=1 y i, total Y = N i=1 y i, distribution function F (t) = N 1 N i=1 I [yi t] where I [ ] is the indicator function, and quantiles ξ p = inf{t : F (t) p}. Asymptotic set-up We study the theoretical large sample properties of the various proposed estimators. The finite sample performance of these estimators is then examined through limited simulation studies. We assume there is a sequence of finite populations and a sequence of sampling designs, both indexed by ν. The population and the sample sizes are denoted by N ν and n ν, respectively. All limiting processes are understood to mean as ν. We also assume N ν, n ν and n ν /N ν π [0, 1) as ν. However, for simplicity of presentation, the index ν will be suppressed. The use of a superpopulation model There exist three general approaches in survey sampling theory: (1) The designbased approach, also called the probability sampling approach, treats the finite population as fixed. That is, U = {(y i, x i ) : i = 1, 2,, N} are fixed values indexed by i. Inferences are made based on the randomization induced by repeated sampling of the indices; (2) The model-based approach, also termed the prediction approach, assumes the finite population values y 1,, y N are realizations of random variables

19 CHAPTER 2. A REVIEW 8 (Y 1,, Y N ), generated from a superpopulation model. The model distribution will lead to valid inference based on the particular set of sampled units, irrespective of the sampling design; (3) The model-assisted approach considers only those estimators which are design-consistent and also approximately model-unbiased under what is termed a working model. This approach attempts to provide valid conditional inferences under the assumed model and at the same time protects against model misspecifications in the sense of providing valid design-based inferences irrespective of the population y-values (Rao, 1994). We will adopt a modified model-assisted approach in this work, though for simplicity we will refer it as model-assisted. First, we use a superpopulation model to describe the relationship between the y variable and the x variables. We then construct estimators that are design-consistent but will be particularly efficient under the working model. It will also be approximately model-unbiased if the design-based estimates for the model parameters are close to the true values. The only exception is Chapter 6, where we consider the model-based framework. Some notation 1) π i = P (i s) denote the inclusion probabilities of a complex sampling scheme; d i = 1/π i denote the basic design weights. 2) E p and V p denote the expectation and variance with respect to the sampling designs (design-based); E ξ and V ξ denote the expectation and variance under the assumed superpopulation model. If such a distinction is not necessary, we will use E and V ar. 3) For vectors θ and ˆθ, ˆθ = θ + Op (n 1/2 ) means ˆθ k = θ k + O p (n 1/2 ) for each component of the vectors. Similar notation is used for random matrices. 4) For vectors θ, θ 1 and θ 2, θ (θ 1, θ 2 ) means θ k (θ 1k, θ 2k ) for each component of the vectors. 5) X n L X denotes X n converges to X in distribution. 6) s denotes the set of non-sampled units in the finite population.

20 CHAPTER 2. A REVIEW Estimation of The Finite Population Mean and Total The finite population mean and total are defined as Ȳ = 1 N N y i and N Y = y i. i=1 i=1 In the absence of supplementary population information, the design-unbiased Horvitz-Thompson estimator ˆȲ HT = N 1 d i y i or ŶHT = d i y i are typically used. This estimator, which could possibly incorporate auxiliary information at the design stage, uses no auxiliary information at the estimation stage. In this section, we will briefly review the three main model-assisted approaches that incorporate auxiliary information into the estimation of the finite population mean and total. Estimators are presented for the mean case only The generalized regression estimator In many sampling situations, the population means of auxiliary variables are known. This information may not have been used in the sampling design, and it is highly desirable to incorporate this information into the estimation procedure. Among commonly used procedures, the generalized regression estimator (GR) (Cassel, Särndal and Wretman, 1976; Särndal, 1980) is the most general one in that the GR is easy to compute and can handle multiple auxiliary variables, continuous or discrete. Suppose that the finite population was generated from an underlying superpopulation described by a linear regression model, y i = x iθ + ε i, i = 1,..., N, (2.1) where ε i s are independent and identically distributed with E ξ (ε i ) = 0, V ξ (ε i ) = σ 2 and θ the unknown superpopulation parameters. A design-based estimator, ˆθ, of the regression coefficients θ can be obtained using sample observations {(y i, x i ), i s} (See Section 3.1 for detailed discussions). The fitted values of y i s are ŷ i = x iˆθ,

21 CHAPTER 2. A REVIEW 10 i = 1,..., N. The total prediction error from the model is N N N e i = y i ŷ i, i=1 i=1 i=1 which is itself a finite population total that can be estimated by a Horvitz-Thompson type estimator This yields d i e i = d i y i d i ŷ i. ˆȲ GR = 1 N N { d i y i + ŷ i d i ŷ i } i=1 = ˆȲ HT + { X ˆ XHT } ˆθ, where X = N 1 N i=1 x i and ˆ XHT = N 1 d i x i. The generalized regression estimator can be motivated without appealing to a superpopulation model. For instance, it is a calibration estimator under a chi-square distance measure (Deville and Särndal, 1992; see also Section 2.2.2). However, the effectiveness of ˆȲ GR depends on how strongly the y variable is linearly related to the x variables. Note that, the above construction of ˆȲ GR uses all the fitted values ŷ i = x iθ for i = 1, 2,, N, but the resulting estimator needs only the known X to be implemented. The ratio estimator ˆȲ R = ( ˆȲ HT / ˆ XHT ) X = ˆR X is a special case of the regression estimator in that it can be motivated along the same lines as before by assuming y i = βx i + ε i, where x is a univariate auxiliary variable and X is its finite population mean. Another commonly used procedure is poststratification, which can be considered as a special case of regression estimation in which the regression variables are indicator variables for the post strata (Särndal, Swensson and Wretman, 1992, p. 264). The generalized regression estimator is asymptotically design-unbiased, and is very efficient in terms of smaller mean square error under the linear working model, (2.1). It also possesses a very desirable property that, if we rewrite ˆȲ GR as a weighted average of the y i s in the sample, w i y i, the regression weights, w i = N 1 d i {1 + (X ˆX HT ) [ d i (x i x)(x i x) ] 1 (x i x)}, satisfy benchmark constraints, i.e., w i x i = X.

22 CHAPTER 2. A REVIEW The calibration estimator One can also approach the use of auxiliary information directly by revising the basic design weights, d i, to satisfy certain benchmark constraints. That is, the sample sum of a weighted average of the auxiliary variables, using the revised weights, w i, should equal the known population totals (or means) for auxiliary variables. Deville and Särndal (1992) proposed a general method of deriving so called calibration estimators by first choosing a distance measure Φ s between the basic design weights and the revised calibration weights and then minimizing this distance subject to specified benchmark constraints. The most commonly used distance measure is the chi-square distance, Φ s = (w i d i ) 2, (2.2) d i q i where the q i s are known positive weights unrelated to d i. The uniform weights q i = 1 are used in most applications, but unequal weights can also be motivated as in Example 1 of Deville and Särndal (1992). The calibration estimator of Ȳ is constructed as ˆȲ C = N 1 w i y i, where the calibration weights, w i, are chosen to minimize Φ s subject to the constraint w i x i = X. (2.3) For the chi-square distance, the resulting calibration estimator is ˆȲ C = N 1 w i y i = ˆȲ HT + ( X ˆ XHT ) ˆβ, (2.4) where ˆ XHT = N 1 d i x i and ˆβ = { d i q i x i x i} 1 d i q i x i y i. Several interesting points are observed here. First, the motivation for calibration estimators does not require an assumed superpopulation model. Second, the calibration weights, w i, give perfect estimates when applied to the auxiliary variables. Deville and Särndal (1992) argued that weights that perform well for the auxiliary variable also should perform well for the study variable. However, it is an implicit underlying assumption that y and x are linearly related that makes this a valid argument. For example, in the case of scalar x with x i = (1, x i ) used in (2.3), it is clear

23 CHAPTER 2. A REVIEW 12 that y i = β 0 + β 1 x i implies ˆȲ C = Ȳ. If a curved relationship exists between y and x, the so-constructed calibration estimator could be very inefficient. For instance, if log(y i ) =. β 0 + β 1 x i, then there is no compelling reason to use ˆȲ C. Lastly, it is possible to choose a different distance measure, but the resulting calibration estimators are all asymptotically equivalent to the generalized regression estimator (Deville and Särndal, 1992) The pseudo-empirical maximum likelihood estimator The nonparametric empirical likelihood for independent random variables has been extensively studied by Owen (1988, 1990, 1991) and other subsequent authors. It has been shown that the empirical likelihood ratio statistics have limiting chi-square distributions in certain situations. Tests and confidence limits for parameters that can be expressed as functions of an unknown distribution function can be obtained through the empirical likelihood ratio statistics. The use of empirical likelihood in the survey context was considered by Chen and Qin (1993). They show that, for simple random sampling without replacement, auxiliary information in the form of known population means or quantiles can be incorporated into the so called empirical maximum likelihood estimator through proper constraints on the maximization of the empirical likelihood. They show that the empirical maximum likelihood estimator is asymptotically equivalent to the customary regression estimator. However, the idea of using a likelihood approach in surveys goes back to Hartley and Rao (1968), when they consider what they term the scale-load approach. By assuming the finite population characteristic is measured on a known scale with a finite set of scale points, they were able to write down the likelihood as a multidimensional hyper-geometric distribution and the limiting case by a multinomial distribution for simple random sampling without replacement. Recently, Chen and Sitter (1999) extend the empirical likelihood approach from simple random sampling to general sampling schemes through a pseudo-empirical likelihood approach. First, the whole finite population {y i, i = 1, 2,, N} can be viewed

24 CHAPTER 2. A REVIEW 13 as iid observations from a certain underlying distribution, F. The corresponding empirical likelihood would then be L(F ) = N i=1 p i with log-likelihood function N l(p) = log(p i ), (2.5) i=1 where p i = p(y i ) is the density or probability mass at observation y i. To overcome the difficulty of not knowing y i for the entire population, they view the log-likelihood function l(p) in (2.5) as a finite population total. A design unbiased estimator of l(p) is then available, namely ˆl(p) = d i log(p i ), (2.6) where d i is the basic design weight and E p { d i log(p i )} = N i=1 log(p i ). Recall that E p denotes the expectation with respect to the sampling design. ˆl(p) is termed the pseudo-empirical log-likelihood. Auxiliary information of the known population means can be incorporated into the estimation of Ȳ by using the Pseudo-empirical Maximum Likelihood Estimator (EL), ˆȲ EL = ˆp i y i, where ˆp i s maximize the pseudo-empirical log-likelihood ˆl(p) subject to p i = 1, p i (x i X) = 0 (0 p i 1). (2.7) One of the surprising facts about the pseudo-empirical maximum likelihood estimator is that ˆȲ EL is asymptotically equivalent to the generalized regression estimator ˆȲ GR. It is surprising since the likelihood-type motivation underlying EL is so different from that of GR or calibration. On the other hand, it is not surprising if we look at the way auxiliary information is used here. The ˆp i s are the revised weights and the constraint p i (x i X) = 0 is identical to the calibration equation used in Section We may also view ˆl(p) = d i log(p i ) > 0 as a distance measure between d i s and p i s (It is not a true distance measure, since p i = d i for all i does not imply ˆl(p) = 0). Thus, much like the calibration method, there is implicit use of a linear relationship between y and x, and a regression type estimator is expected in such situations. Another important feature of EL is that the weights, ˆp i, are intrinsicly positive.

25 CHAPTER 2. A REVIEW 14 Survey statisticians have long recognized that some estimation procedures can result in negative weights and this is very undesirable for some situations (see, for instance, Huang and Fuller, 1978 and Rao and Singh, 1997). The consequence could be, for example, a negative estimate for known positive population quantities, or a non-monotonic estimated distribution function. We will address this issue further in subsequent chapters. For multiple auxiliary variables, computational difficulties associated with EL are not trivial. By using the Lagrange multiplier method, it can be shown that ˆp i = d i 1 + λ (x i X), where d i = d i / d i and λ is the solution to d i (x i X) 1 + λ (x i X) = 0. Solving the above nonlinear system with a vector Lagrange multiplier λ can be computationally awkward. A new partial solution to this problem is given in Chapter Estimation of The Distribution Function The finite population distribution function evaluated at t is defined as the proportion of units with y-values less than or equal to t, F (t) = 1 N I [yi t], N i=1 By replacing y i by I [yi t], many of the estimators that were constructed for estimating the population mean can be used for estimating F (t). For instance, the Horvitz- Thompson estimator for F (t) is ˆF HT (t) = N 1 d i I [yi t]. When auxiliary information is available, some special care needs to be taken when one is constructing estimators for the distribution function using auxiliary information. With complete auxiliary information (i.e. x i known for i = 1, 2,, N), there are two leading estimators for the finite population distribution function: the design-based (model-assisted)

26 CHAPTER 2. A REVIEW 15 difference estimator (Rao, Kovar and Mantel, 1990) and the model-based prediction estimator (Chambers and Dunstan, 1986) The design-based difference estimator The generalized regression estimator introduced in Section can be rewritten as a generalized difference estimator (GD), ˆȲ GD = 1 N N { d i y i + ˆµ i d iˆµ i }, where ˆµ i = x iˆθ is a design-based estimator of µ i = E ξ (y i x i ) = x iθ. A model-assisted difference estimator of F (t) can be constructed by replacing y i by I [yi t] and µ i by G i = E ξ (I [yi t] x i ) = P r(y i t x i ), and plugging in a proper estimate for G i. This difference estimator was first proposed by Rao, Kovar and Mantel (1990). Under a simple linear regression working model y i = α + βx i + ε i, i = 1, 2,, N, (2.8) G i = P r{ε i t α βx i } = G(t α βx i ), where G( ) is the cumulative distribution function of the error term, ε i, which can be estimated by an empirical distribution function using the fitted residuals. The resulting estimator of F (t) is i=1 where ˆF d (t) = 1 N { π 1 i I [yi t] + N i=1 Ĝ i πi 1 Ĝ ic }, (2.9) Ĝ i = { k s β = π 1 π 1 i i πk 1 I [ ε k t α βx i ] }/ πk 1, Ĝ ic = { k s k s π i π ik I [ εk t α βx i ] }/ k s π i π ik, (2.10) (x i x)(y i ỹ)/ π 1 i (x i x) 2, α = ỹ β x, ε k = y k α εx k, x = x i / π 1 i, ỹ = π 1 i y i / πi 1, and π i, π ij are the first- and secondorder inclusion probabilities. Ĝ i is design-unbiased for G i and Ĝic is conditionally design-unbiased for G i given the i-th unit is selected in the sample. Extension from the simple linear regression working model to a general regression model is straightforward. Note that a regression working model and complete auxiliary information are essential for the implementation of this estimator. The error

27 CHAPTER 2. A REVIEW 16 cumulative distribution function, G( ), attached to the regression model, plays a key role in the derivation of the estimator. ˆF d (t) is asymptotically design-unbiased under a general sampling design and approximately model-unbiased under a working model such as (2.8). Godambe (1989) derived ˆF d (t) based on the model- and design-based optimum estimating function theory and showed that ˆF d (t) is robust against departures from the superpopulation model The model-based prediction estimator The paper of Chambers and Dunstan (1986) motivated much of the later work in this area. In their model-based framework, x and y are assumed to follow a superpopulation model. Though the results can be extended to more complex models, for simplicity of presentation, we will restrict attention to the simple linear regression model (2.8). Under model (2.8), the model-based estimator of F (t) is given by ˆF m (t) = 1 N { I [yi t] + 1 n I [yi t ˆβ(x j x i )] }, j s where ˆβ = (y i ȳ)(x i x)/ (x i x) 2. ˆFm (t) is asymptotically model-unbiased for F (t). We will consider this estimator in detail in Chapter 6. A crucial point here is that ˆF m (t) is independent of the sampling design, as (y i, x i ) for i = 1, 2,, N are viewed as independent sample values from superpopulation model (2.8) regardless of whether they belong to the set of sampled units, s, or to the set of nonsampled units, s Supplementary remarks The use of complete auxiliary information in estimating the finite population distribution function has attracted increased attention in recent literature. Several other estimators which incorporate knowledge of an auxiliary variable known for every unit in the finite population have also been proposed and their performances examined and compared. See, for examples, Chambers, Dorfman and Hall (1992), Kuk (1993), Silva and Skinner (1995) and Wang and Dorfman (1996).

28 CHAPTER 2. A REVIEW 17 The model-based ˆF m (t) is model-unbiased but design-inconsistent. Rao, Kovar and Mantel (1990) demonstrate through simulation that the model-based ˆF m (t) has superior performance in small samples when the superpopulation model is correctly specified but is much more vulnerable than ˆF d (t) to model-misspecification and can perform poorly in large samples. Chambers, Dorfman and Hall (1992) do a theoretical comparison under simple random sampling and conclude that there is no clear winner. Whether one chooses to work under a model-based framework and use ˆF m (t) or a design-based framework and use ˆF d (t), variance estimation will need to be considered. We do this for ˆF d (t) in Section 5.2 and for ˆF m (t) in Section A Discussion We have presented the three main model-assisted estimation procedures for finite population means and totals. All of these methods have only been discussed in the context of a linear regression working model and essentially incorporate the auxiliary variables through their known population means even when the auxiliary variables are known for the entire population. The generalized regression estimator, ˆȲ GR, has (2.1) as its base model, and variance reduction from using ˆȲ GR over ˆȲ HT is directly related to the magnitude of the linear correlation coefficients between y and x. The calibration estimator and the pseudo-empirical maximum likelihood estimator can be motivated from different perspectives without assuming a model. However, their effectiveness does rely on the implicit assumption that y and x are linearly related. They are both asymptotically equivalent to the GR, and calibration on x variables directly requires a linear working model to justify. To answer the fundamental question how can complete auxiliary information be effectively used at the estimation stage, we need to use more sophisticated modeling. It is the model structure (relationship between y and x) that determines how the auxiliary information should best be used. x variables do not necessarily provide direct information for population quantities of y, they provide relevant information through a model. We need a general approach that can handle any linear or nonlinear relationship between y and x. Also, the approach should be model-assisted in

29 CHAPTER 2. A REVIEW 18 that, the resulting estimator should be asymptotically design-unbiased irrespective of the correctness of the working model, but should be particularly efficient when the working model is correctly specified. The estimation of the finite population distribution function using complete auxiliary information needs special treatment. Although F (t) can be viewed as a finite population mean for the indicator variable I [y t], estimators constructed for estimating Ȳ may not be transplantable for the estimation of F (t). Part of the reason is that, for example, a simple linear regression model assumed for y and x can not be transmitted to I [y t] and x or I [y t] and I [x t]. The model must be used in its original form while we deal with the dichotomous variable I [y t]. When a more complex working model is used, this will become more prominent. We will discuss the various approaches in Chapter 3 under a general modeling process.

30 Chapter 3 The Effective Use of Complete Auxiliary Information Through Model-Calibration In this chapter, we consider the use of more complex working models in obtaining model-assisted estimators by first generalizing the calibration method of Section We term the approach model-calibration for reasons which will become readily apparent. We argue that, under a general modeling process, complete auxiliary information should be incorporated into the construction of estimators through fitted values. How to do this properly is fairly straightforward in the case of a GR (see Section 3.2.3) but not so for calibration. We introduce a general framework which is simple and estimators for the population mean and total reduce to the usual estimators under a linear model. Once this generalization is realized, some interesting relationships between a linear model and the use of complete auxiliary information become more obvious and are discussed. Also, some differences between the approaches become more distinct. For example, it has been noted that the calibration estimator reduces to a GR under a chi-square distance measure (Deville and Särndal, 1992), where an underlying linear regression model is used. This is no longer the case when the methods are generalized to nonlinear models, and the proposed model-calibration estimators perform better. 19

31 CHAPTER 3. MODEL CALIBRATION 20 The proposed model-calibration estimators of the population mean and total can effectively handle any linear or non-linear models and reduce to the conventional calibration estimator (the generalized regression estimator) under a linear model. We then go on to similarly generalize the pseudo-empirical maximum likelihood estimator (Chen and Sitter, 1999) and show that it gives an estimator that is asymptotically equivalent to the model-calibration estimator but with positive weights, and therefore is preferred. Finite sample performance of these estimators is investigated through a limited simulation study. First, the modeling issue is addressed in Section 3.1. Estimation of the population mean and total through model-calibration and pseudo-empirical likelihood is then introduced in Section 3.2. Special treatment for the distribution function under a general modeling process is given in Section 3.3. In Section 3.4, we propose a difference estimator and a regression-type estimator for the quantile process using complete auxiliary information and a general model. All the proofs are deferred to Section Modeling Assume the relationship between y and x can be described by a superpopulation model through the first and second moments, E ξ (y i x i ) = µ(x i, θ), V ξ (y i x i ) = vi 2 σ 2, i = 1, 2,, N, (3.1) where θ = (θ 0,..., θ p ) and σ 2 are unknown superpopulation parameters, µ(x, θ) is a known function of x and θ, the v i s are known constants for given x i s and E ξ and V ξ denote the expectation and variance with respect to the superpopulation model. We also assume that (y 1, x 1 ),..., (y N, x N ) are mutually independent. The model structure (3.1) is quite general and includes two very important cases: (i) the linear or non-linear regression model, y i = µ(x i, θ) + v i ε i, i = 1, 2,, N, (3.2) where ε i s are independent and identically distributed random variables with E ξ (ε i ) = 0 and V ξ (ε i ) = σ 2, and v i = v(x i ) is a strictly positive known function of x i only;

32 CHAPTER 3. MODEL CALIBRATION 21 (ii) the generalized linear model, g(µ i ) = x iθ, V ξ (y i x i ) = v(µ i ), i = 1, 2,, N, (3.3) where µ i = E ξ (y i x i ), g( ) is a link function and v( ) is a variance function. Consider the estimation problem for the model parameters: (a) When a model-based approach is employed (see Chapter 6), (y i, x i ), i s is viewed as an iid sample from the superpopulation, i.e., randomization is with respect to the underlying distribution for the superpopulation, sampling schemes are irrelevant here. The superpopulation parameters, θ, can then be estimated using standard procedures. (b) We need design-based estimates for the model parameters. Under the designbased framework, randomization is with respect to the repeated sampling, the sample data obtained from a complex sampling scheme may not follow the same model structure as that of the whole finite population, and the superpopulation parameter θ may be meaningless or not be interpretable from the design-based point of view. In this case, following Godambe and Thompson (1986), we replace θ by θ N, a model-based estimate of θ based on the data from the entire finite population. θ N is itself a finite population quantity and can then be estimated by a design-based estimator, ˆθ, from the sample data. The notion underlying this argument is: when the superpopulation model is correct, θ and θ N are usually very close to each other since N is usually large, and the estimator for estimating θ N is essentially for estimating θ when the purpose is to estimate θ; when the superpopulation model is incorrect, θ N is still a clearly defined finite population quantity and design-based inference is still valid. For illustration, consider two important cases. Case I. θ N can be expressed explicitly as functions of population totals for properly defined population variables. For example, suppose the superpopulation follows a homogeneous linear regression model and θ N are defined as the regression coefficients for the finite population: θ N = (X NX N ) 1 X NY N, where X N is the N (p + 1) matrix with rows (1, x i) for

33 CHAPTER 3. MODEL CALIBRATION 22 i = 1,..., N and Y N = (y 1,..., y N ). Note that N N θ N = ( x i x i) 1 x i y i, i=1 i=1 and a design-based estimator ˆθ is obtained by plugging in design-based estimates for various population totals in θ N : ˆθ = ( d i x i x i) 1 d i x i y i = (X ndx n ) 1 X ndy n, where D = diag(d 1,, d n ) and the d i s are the basic design weights, with X n and Y n in obvious notation. Case II. θ N is defined by estimating equations. Suppose that the generalized linear model (3.3) is assumed. We define θ N as the maximum quasi-likelihood estimator of θ based on the entire finite population, i.e., the solution of the estimating equations (Molina and Skinner, 1992), N X i [g (1) {µ(x i, θ)}v{µ(x i, θ)}] 1 [y i µ(x i, θ)] = 0, (3.4) i=1 where X i = (1, x i) and g (1) (u) = dg(u)/du. The estimating functions on the left hand side of (3.4) are population totals and ˆθ is defined as the solution of the design-based sample version of (3.4), i.e., the solution of the following estimating equations: d i X i [g (1) {µ(x i, θ)}v{µ(x i, θ)}] 1 [y i µ(x i, θ)] = 0. The estimate ˆθ is then obtained by standard Newton-Raphson iterative procedures, θ (m+1) = θ (m) + δ (X ng 1 W 1 G 1 X n ) 1 X ng 1 W 1 (Y n µ n ), θ=θ (m) where G = diag(g (1) (µ 1 ),, g (1) (µ n )), W = diag(π 1 v(µ 1 ),, π n v(µ n )), µ n = (µ 1,, µ n ) and µ i = µ(x i, θ). δ (0, 1) is a pre-chosen constant to accelerate the convergence.

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

Empirical Likelihood Methods

Empirical Likelihood Methods Handbook of Statistics, Volume 29 Sample Surveys: Theory, Methods and Inference Empirical Likelihood Methods J.N.K. Rao and Changbao Wu (February 14, 2008, Final Version) 1 Likelihood-based Approaches

More information

ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MODEL-CALIBRATED PSEUDO EMPIRICAL LIKELIHOOD METHOD

ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MODEL-CALIBRATED PSEUDO EMPIRICAL LIKELIHOOD METHOD Statistica Sinica 12(2002), 1223-1239 ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MOD-CALIBRATED PSEUDO EMPIRICAL LIKIHOOD METHOD Jiahua Chen and Changbao Wu University of Waterloo Abstract:

More information

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/2/03) Ed Stanek Here are comments on the Draft Manuscript. They are all suggestions that

More information

Empirical Likelihood Inference for Two-Sample Problems

Empirical Likelihood Inference for Two-Sample Problems Empirical Likelihood Inference for Two-Sample Problems by Ying Yan A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Statistics

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions

A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions Y.G. Berger O. De La Riva Torres Abstract We propose a new

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson 1 Introduction When planning the sampling strategy (i.e.

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson Department of Statistics Stockholm University Introduction

More information

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Statistica Sinica 8(1998), 1165-1173 A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Phillip S. Kott National Agricultural Statistics Service Abstract:

More information

On the asymptotic normality and variance estimation of nondifferentiable survey estimators

On the asymptotic normality and variance estimation of nondifferentiable survey estimators On the asymptotic normality and variance estimation of nondifferentiable survey estimators Jianqiang.C Wang Visiting Assistant Professor, Department of Statistics Colorado State University Jean Opsomer

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

No is the Easiest Answer: Using Calibration to Assess Nonignorable Nonresponse in the 2002 Census of Agriculture

No is the Easiest Answer: Using Calibration to Assess Nonignorable Nonresponse in the 2002 Census of Agriculture No is the Easiest Answer: Using Calibration to Assess Nonignorable Nonresponse in the 2002 Census of Agriculture Phillip S. Kott National Agricultural Statistics Service Key words: Weighting class, Calibration,

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Imputation for Missing Data under PPSWR Sampling

Imputation for Missing Data under PPSWR Sampling July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st )

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Unequal Probability Designs

Unequal Probability Designs Unequal Probability Designs Department of Statistics University of British Columbia This is prepares for Stat 344, 2014 Section 7.11 and 7.12 Probability Sampling Designs: A quick review A probability

More information

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Inferences about Parameters of Trivariate Normal Distribution with Missing Data Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing

More information

Simple design-efficient calibration estimators for rejective and high-entropy sampling

Simple design-efficient calibration estimators for rejective and high-entropy sampling Biometrika (202), 99,, pp. 6 C 202 Biometrika Trust Printed in Great Britain Advance Access publication on 3 July 202 Simple design-efficient calibration estimators for rejective and high-entropy sampling

More information

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys The Canadian Journal of Statistics Vol.??, No.?,????, Pages???-??? La revue canadienne de statistique Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys Zhiqiang TAN 1 and Changbao

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Non-Parametric Bootstrap Mean. Squared Error Estimation For M- Quantile Estimators Of Small Area. Averages, Quantiles And Poverty

Non-Parametric Bootstrap Mean. Squared Error Estimation For M- Quantile Estimators Of Small Area. Averages, Quantiles And Poverty Working Paper M11/02 Methodology Non-Parametric Bootstrap Mean Squared Error Estimation For M- Quantile Estimators Of Small Area Averages, Quantiles And Poverty Indicators Stefano Marchetti, Nikos Tzavidis,

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Empirical Likelihood Methods for Pretest-Posttest Studies

Empirical Likelihood Methods for Pretest-Posttest Studies Empirical Likelihood Methods for Pretest-Posttest Studies by Min Chen A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in

More information

USING PRIOR INFORMATION ABOUT POPULATION QUANTILES IN FINITE POPULATION SAMPLING

USING PRIOR INFORMATION ABOUT POPULATION QUANTILES IN FINITE POPULATION SAMPLING Sankhyā : The Indian Journal of Statistics Special Issue on Bayesian Analysis 1998, Volume 60, Series A, Pt. 3, pp. 426-445 USING PRIOR INFORMATION ABOUT POPULATION QUANTILES IN FINITE POPULATION SAMPLING

More information

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Statistica Sinica 22 (2012), 777-794 doi:http://dx.doi.org/10.5705/ss.2010.238 BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Desislava Nedyalova and Yves Tillé University of

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

A decision theoretic approach to Imputation in finite population sampling

A decision theoretic approach to Imputation in finite population sampling A decision theoretic approach to Imputation in finite population sampling Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 August 1997 Revised May and November 1999 To appear

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Jackknife Empirical Likelihood for the Variance in the Linear Regression Model

Jackknife Empirical Likelihood for the Variance in the Linear Regression Model Georgia State University ScholarWorks @ Georgia State University Mathematics Theses Department of Mathematics and Statistics Summer 7-25-2013 Jackknife Empirical Likelihood for the Variance in the Linear

More information

Calibration estimation in survey sampling

Calibration estimation in survey sampling Calibration estimation in survey sampling Jae Kwang Kim Mingue Park September 8, 2009 Abstract Calibration estimation, where the sampling weights are adjusted to make certain estimators match known population

More information

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Domain estimation under design-based models

Domain estimation under design-based models Domain estimation under design-based models Viviana B. Lencina Departamento de Investigación, FM Universidad Nacional de Tucumán, Argentina Julio M. Singer and Heleno Bolfarine Departamento de Estatística,

More information

An Approach to Constructing Good Two-level Orthogonal Factorial Designs with Large Run Sizes

An Approach to Constructing Good Two-level Orthogonal Factorial Designs with Large Run Sizes An Approach to Constructing Good Two-level Orthogonal Factorial Designs with Large Run Sizes by Chenlu Shi B.Sc. (Hons.), St. Francis Xavier University, 013 Project Submitted in Partial Fulfillment of

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

F. Jay Breidt Colorado State University

F. Jay Breidt Colorado State University Model-assisted survey regression estimation with the lasso 1 F. Jay Breidt Colorado State University Opening Workshop on Computational Methods in Social Sciences SAMSI August 2013 This research was supported

More information

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance

More information

Weight calibration and the survey bootstrap

Weight calibration and the survey bootstrap Weight and the survey Department of Statistics University of Missouri-Columbia March 7, 2011 Motivating questions 1 Why are the large scale samples always so complex? 2 Why do I need to use weights? 3

More information

Pseudo-empirical likelihood ratio confidence intervals for complex surveys

Pseudo-empirical likelihood ratio confidence intervals for complex surveys The Canadian Journal of Statistics 359 Vol. 34, No. 3, 2006, Pages 359 375 La revue canadienne de statistique Pseudo-empirical likelihood ratio confidence intervals for complex surveys Changbao WU and

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier

More information

Likelihood and p-value functions in the composite likelihood context

Likelihood and p-value functions in the composite likelihood context Likelihood and p-value functions in the composite likelihood context D.A.S. Fraser and N. Reid Department of Statistical Sciences University of Toronto November 19, 2016 Abstract The need for combining

More information

2 Naïve Methods. 2.1 Complete or available case analysis

2 Naïve Methods. 2.1 Complete or available case analysis 2 Naïve Methods Before discussing methods for taking account of missingness when the missingness pattern can be assumed to be MAR in the next three chapters, we review some simple methods for handling

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total. Abstract

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total.   Abstract NONLINEAR CALIBRATION 1 Alesandras Pliusas 1 Statistics Lithuania, Institute of Mathematics and Informatics, Lithuania e-mail: Pliusas@tl.mii.lt Abstract The definition of a calibrated estimator of the

More information

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003. A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Link lecture - Lagrange Multipliers

Link lecture - Lagrange Multipliers Link lecture - Lagrange Multipliers Lagrange multipliers provide a method for finding a stationary point of a function, say f(x, y) when the variables are subject to constraints, say of the form g(x, y)

More information

Asymptotic Normality under Two-Phase Sampling Designs

Asymptotic Normality under Two-Phase Sampling Designs Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context

More information

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University  babu. Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling

More information

Discussion of Maximization by Parts in Likelihood Inference

Discussion of Maximization by Parts in Likelihood Inference Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu

More information

Quantitative Empirical Methods Exam

Quantitative Empirical Methods Exam Quantitative Empirical Methods Exam Yale Department of Political Science, August 2016 You have seven hours to complete the exam. This exam consists of three parts. Back up your assertions with mathematics

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

Empirical likelihood inference for a common mean in the presence of heteroscedasticity

Empirical likelihood inference for a common mean in the presence of heteroscedasticity The Canadian Journal of Statistics 45 Vol. 34, No. 1, 2006, Pages 45 59 La revue canadienne de statistique Empirical likelihood inference for a common mean in the presence of heteroscedasticity Min TSAO

More information

A Bayesian perspective on GMM and IV

A Bayesian perspective on GMM and IV A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

NONINFORMATIVE NONPARAMETRIC BAYESIAN ESTIMATION OF QUANTILES

NONINFORMATIVE NONPARAMETRIC BAYESIAN ESTIMATION OF QUANTILES NONINFORMATIVE NONPARAMETRIC BAYESIAN ESTIMATION OF QUANTILES Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 Appeared in Statistics & Probability Letters Volume 16 (1993)

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

A JACKKNIFE VARIANCE ESTIMATOR FOR SELF-WEIGHTED TWO-STAGE SAMPLES

A JACKKNIFE VARIANCE ESTIMATOR FOR SELF-WEIGHTED TWO-STAGE SAMPLES Statistica Sinica 23 (2013), 595-613 doi:http://dx.doi.org/10.5705/ss.2011.263 A JACKKNFE VARANCE ESTMATOR FOR SELF-WEGHTED TWO-STAGE SAMPLES Emilio L. Escobar and Yves G. Berger TAM and University of

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Optimal Calibration Estimators Under Two-Phase Sampling

Optimal Calibration Estimators Under Two-Phase Sampling Journal of Of cial Statistics, Vol. 19, No. 2, 2003, pp. 119±131 Optimal Calibration Estimators Under Two-Phase Sampling Changbao Wu 1 and Ying Luan 2 Optimal calibration estimators require in general

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

5.3 LINEARIZATION METHOD. Linearization Method for a Nonlinear Estimator

5.3 LINEARIZATION METHOD. Linearization Method for a Nonlinear Estimator Linearization Method 141 properties that cover the most common types of complex sampling designs nonlinear estimators Approximative variance estimators can be used for variance estimation of a nonlinear

More information

Accounting for Complex Sample Designs via Mixture Models

Accounting for Complex Sample Designs via Mixture Models Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important

More information

RESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy.

RESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy. CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2014 www.csgb.dk RESEARCH REPORT Ina Trolle Andersen, Ute Hahn and Eva B. Vedel Jensen Vanishing auxiliary variables in PPS sampling with applications

More information

Approximate Inference for the Multinomial Logit Model

Approximate Inference for the Multinomial Logit Model Approximate Inference for the Multinomial Logit Model M.Rekkas Abstract Higher order asymptotic theory is used to derive p-values that achieve superior accuracy compared to the p-values obtained from traditional

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,

More information

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Statistica Sinica 17(2007), 1047-1064 ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Jiahua Chen and J. N. K. Rao University of British Columbia and Carleton University Abstract: Large sample properties

More information

A Note on Auxiliary Particle Filters

A Note on Auxiliary Particle Filters A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,

More information