Weight calibration and the survey bootstrap

Size: px
Start display at page:

Download "Weight calibration and the survey bootstrap"

Transcription

1 Weight and the survey Department of Statistics University of Missouri-Columbia March 7, 2011

2 Motivating questions 1 Why are the large scale samples always so complex? 2 Why do I need to use weights? 3 What is, and why is it useful for complex survey data? 4 If I have calibrated my data, shouldn t the standard errors go down? 5 How can I see that? What variance estimator should I use? 6 : interplay between and various variance methods. Scattered additional ideas: Survey

3

4 Most large scale data sets utilize features of complex survey sampling designs (Cochran 1977, Kish 1965, Korn & Graubard 1999): stratification: sampling independently within parts of population or from different frames clustering: sampling groups of observation units varying probabilities of selection: optimize performance of the sampling design multiple stages of selection: districts, schools, classes, students Some features are aimed at improving precision, while others, at minimizing costs.

5 sample design A, B, C,..., Alpha, Beta, Gamma, Delta samples Stratification of public schools by jurisdiction, private schools by affiliation Systematic sampling of schools with probability proportional to size Sampling of students to attain (approximately) self-weighting sample Oversampling of (areas and schools with large % of) minorities Matrix sampling of items into booklets

6

7 Setting Finite population U with L strata and N h units in each Designs: sampling with replacement, n h units from stratum h Sample s of size n = h n h = total number of PSUs Estimates: t[y] for T [Y ] ˆθ = f (t1,..., t k ) for θ = f (T 1,..., T k ) ˆθ from estimating equations u(x, ˆθ) = 0 Goal: of design variance

8 Expansion weight: w i = 1 π i = Weighted 1 IPr [ unit i in sample ] Expansion estimator (Horvitz & Thompson 1952) of the total: t[y] = i s w i y i Weighted mean: Analytical use: ȳ w = i s w iy i i s w i = t[y] t[1] ˆβ WLS = (X WX) 1 X Wy, W = diag(w 1,..., w n ) Pseudo-maximum likelihood: wi l(θ, x i ) max θ

9 Sampling weights Pfeffermann (1993), Binder & Roberts (2009): 1 The weights can be used to test and protect against nonignorable sampling designs which could cause selection bias. Without the weights, the parameters may be biased if the selection probabilities depend on characteristics of the units. 2 The weights can be used to protect against misspecification of the model holding in the population. Estimation will be consistent for the census regression, even though a formal one.

10 Replicate weights Besides the main sampling weights, microdata often come with (dozens or hundreds of) replicate weights. Necessary for variance It suffices for the end user to run the model with all sets of weights and compute the variance estimator as the empirical variance of the results with different weights

11 Accuracy of estimates Stratification Weighting Population Sample y

12 Stratification Accuracy of estimates Weighting Population Sample y

13 Deville & Sarndal (1992), Särndal (2007), Kott (2009) Auxiliary information: variables x k Either x k s are known for every unit k in the population, or the population totals, or independent control totals T [x] = k U x k are known Human population samples: demography (from census data or CPS) Calibrate the design weights d i w i so that equations (benchmark constraints) hold: x k (1) i s w i x i = k U Typical procedures: raking, generalized raking (Deville, Sarndal & Sautory 1993)

14 Extensions Instrumental variables (Kott 2006) using information at different levels (Estevao & Säarndal 2006, Isaki, Tsay & Fuller 2004) Restricted weight ranges (Chen, Sitter & Wu 2002, Théberge 2000) Non-response adjustments (targeting bias rather than variance) (Crouse & Kott 2004, Kott 2006)

15 A popular functional form: Linear w i = d i (1 + λ x i ) (2) where λ is the vector of Lagrange multipliers Solution: λ = ( i s d i x i x i ) 1 ( k U x k i s w i x i ) The resulting expansion estimator of the total is identical to GREG estimator (Särndal, Swensson & Wretman 1992) Disadvantage: weights may become negative.

16 Owen (1988) proposed to construct confidence intervals for the population mean using profile empirical likelihood function: { n R(µ) = nw i w i 0, i=1 n w i = 1, i=1 Maximum empirical likelihood estimator: ˆµ = arg max R(µ) Asymptotic distributional results: n i=1 ˆµ is asy. normal; 2 log R(ˆµ) } w i y i = µ, (3) d χ 2 Extensions from i.i.d. to regression models(owen 1991), models based on estimating equations (Qin & Lawless 1994),...

17 Pseudo-EL for survey data Chen & Qin (1993), Chen & Sitter (1999), Wu (2004b), Rao & Wu (2009) discussed complex survey extensions: l(w) = d i ln w i (4) use: l(w) max w Solution: s.t. N i s w i = 0 = i s Nd i j s d j i s w i x i = k U d i j s d j λ (x i X) x k, w i 0, i s w i = 1 x i X 1 + λ (x i X) (5) (6)

18 Pseudo-EL for survey data Convex hull condition: the population value X is in the convex hull of the sample x i Non-negative weights Convex optimization problem, solution always exists, and algorithms find it reliably (Wu 2004b) Optimality properties: Wu (2003) Chen, Sitter & Wu (2002) discuss how to utilize EL methods to construct bounded weights (may have to relax the calibrating constraints) Combining information from multiple surveys (Wu 2004a)

19

20 Wolter (2007), Eltinge (1996) Reporting and analytic purposes: goals a survey analyst needs standard errors to include in the report an applied researcher needs standard errors to construct confidence intervals and test their substantive models Design purposes: a sample designer needs to know population variances to come up with efficient designs, strata allocations, small area estimators

21 Expansion estimator Horvitz-Thompson estimator t[y] = i s d i y i = i s y i /π i (with respect to sampling, i.e., design-variance): V { t[y] } = Estimator of variance: k,l U v { t[y] } = i,j s (π kl π k π l ) y k π k y l π l π ij π i π j π ij y i π i y j π j

22 variance estimator By taking Taylor series expansion near the population value θ, one can obtain linearization estimators for: Function of moments: [ f ] v L [ˆθ] MSE[ˆθ] v t k t k Example: ratio estimator of the total t r = t[y]t [x]/t[x], variance estimator v L [t r ] = T [x]2 t[x] 2 v[e i], e i = y i rx i, T [x] 0 Remind notation: slide 2 k

23 variance estimator By taking Taylor series expansion near the population value θ, one can obtain linearization estimators for: Estimator defined by estimating equations: v L [ˆθ] MSE[ˆθ] ( u) 1 1T v[u(x, ˆθ)]( u) Examples: regression (Fuller 1975) i s w i u(x i, y i, θ) i s Closely related to White (1980) heteroskedasticity-robust estimator w i x i (y i x i θ) = 0 examples: GLM (Binder 1983), pseudo-mle (Skinner 1989) Remind notation: slide 2

24 estimators Särndal, Swensson & Wretman (1992), Deville & Sarndal (1992): variables x Parameter of interest: total of y of the calibrated estimator: V[t cal ] = ij U(π ij π i π j )(d i E i )(d j E j ), where E i = y i x ib are residuals from the census regression (c.f. y i Ȳ for H-T estimator, y i rx i for ratio) estimator: v[t cal ] = ij s π ij π i π j π ij (w i e i )(w j e j ) where e i = y i x i ˆβ are residuals from the sample (weighted) LS regression

25 methods For a given procedure (X 1,..., X n ) ˆθ: 1 To create data for replicate r, reshuffle PSUs, omitting some and/or repeating others, according to a certain replication scheme 2 Using the original procedure and the replicate data, obtain parameter estimate ˆθ (r) 3 Repeat Steps 1 2 for r = 1,..., R 4 Estimate variance/mse as v m [ˆθ] = A R R (ˆθ (r) θ) 2 (7) r=1 where A is a scaling constant, θ = r ˆθ (r) /R for variance and θ = ˆθ for MSE Alternative implementation: use replicate weights w (r) hij instead of actually resampling the data.

26 The jackknife Kish & Frankel (1974), Krewski & Rao (1981) Replicates: omit only one PSU from the entire sample Replicate weights: if unit k from stratum g is omitted, 0, h = g, i = k w (gk) n hij = g n g 1 w hij, h = g, i k w hij, h g Number of replicates: R = n estimators: v J1 = h v J2 = h n h 1 n h n h 1 n h (ˆθ (hi) ˆθ h ) 2 i (ˆθ (hi) ˆθ) 2 i

27 Balanced repeated replication (BRR) Design restriction: n h = 2 PSUs/stratum Replicates (half-samples): omit one of the two PSUs from each stratum Replicate weights: w (r) hij = { 2w hij, PSU hi is retained 0, PSU hi is omitted (2nd order) balance conditions: each PSU is used R/2 times each pair of PSUs is used R/4 times Number of replicates: L R 2 L McCarthy (1969): L R = 4m L + 3 using Hadamard matrices Scaling factor in (7): A = 1

28 Rescaling (RBS) Rao & Wu (1988): for parameter θ = f ( x), 1 Sample with replacement m h out of n h units in stratum h. 2 Compute pseudo-values x (r) h = x h + m 1/2 h (n h 1) 1/2 ( x ( r) h x h ), x (r) = W h x (r) h, θ(r) = f ( x (r) ) (8) h 3 Repeat steps 1 2 for r = 1,..., R. 4 Compute v RBS [ˆθ] using (7) with A = 1.

29 Scaling of weights Rao, Wu & Yue (1992): weights can be scaled instead of values. m ( r) hi = # times the i-th unit from stratum h is used in the r-th replicate Replicate weight is ( w (r) hik [1 = mh ) 1/2 ( mh ) 1/2 nh ] + m ( r) n h 1 n h 1 m hi w hik h (9) Equivalent to RBS for functions of moments Applicable to ˆθ obtained from estimating equations

30 Choice of m h : Bootstrap scheme options m h n h 1 to ensure non-negative replicate weights m h = n h 1: no need for internal scaling m h = n h 3: matching third moments (Rao & Wu 1988) evidence (Kovar, Rao & Wu 1988): for n h = 5, the choice m h = n h 1 leads to more stable estimators with better coverage than m h = n h 3 Choice of R: No theoretical foundations Popular choices: R = 100, 200 or 500 R design degrees of freedom = n L

31 Yung (1997), Yeo, Mantel & Liu (1999) Mean Confidentiality protection: units with a weight of 0 belong to the same PSU, risk of identification. Replace the number of draws m ( r) hi m ( r) hi = 1 K rk k=(r 1)K +1 m ( k) hi Take K large enough so that IPr [ m ( r) hi = 0] is small. Proceed to compute the weights (9). Compute v MBOOT using (7) with scaling factor A = K. Number of resulting weight variables = R/K. Warning: no formal theory have been developed so far. by

32 Pros and cons of resampling estimators + Analysts and data users only need software that does weighted + No need to program specific estimators for each model + No need to release strata and sampling unit identifiers in public data sets (jackknife?) Computationally intensive, especially when n Post-stratification and non-response adjustments need to be performed on every set of weights Bulky data files with many weight variables

33 Goal of this study Give the calibrated estimators to people! for calibrated estimators requires knowledge of what the calibrating variable are, and what their population totals are Such information is unlikely to be available to analysis and public micro data users Resampling estimators must play a role

34

35 Population 1/10 of IPUMS 2000 Census 5% sample (Ruggles, Alexander, Genadek, Goeken, Schroeder & Sobek 2010) Alaska, Hawaii, DC excluded individuals 2052 PUMAs From 298 to 2080 individuals in each PUMA Demographic variables: age, gender, race, ethnicity, education, language spoken Economic variables: income, employment status PUMA = Public Use Microdata Area: 100,000+ residents; whole central city, MSA, PMSA, non-metro places

36 Sample 41 strata state (4 New England states conglomerated; 5 Mountains states conglomerated) PSU = PUMA In population, 11 N h 233 PSU/stratum In sample, 2 n h 6 PSU/stratum (more in larger strata) Resulting sample size: n = 120 PSUs 1st stage of selection: PPS (Rao-Hartley-Cochran method) 2nd stage of selection: persons per PSU (fewer in rural areas, more in urban areas) 2000 Monte Carlo replications Sample size: individuals

37 Statistics collected: English proficiency (6 categories) Marital status (6 categories) Statistics Logistic regression: high school graduate = (age + age 2 ) sex + black + white + hispanic Gini coefficient of income inequality (Lorenz curve) variables (if used): Age groups (0 to 10, 10 to 20,... ) Race (white, black, Asian) Gender Hispanic (0/1) Strata indicators Linear (2), EL (5)

38 methods + Deville & Sarndal (1992) estimator for calibrated data + [ {Jackknife, Bootstrap R = 100, Bootstrap R = 990, Mean R = } {No, of the main weight only, ] of the main and replicate weights}

39 Weights method None EL Linear Mean CV Median of min{w i } Median of max{w i } Median of max{w i} min{w i } Min w i /d i Max w i /d i : increases the spread of weights

40 bias Average relative bias of parameter estimates: method None EL Linear Language 0.03% -0.21% -0.22% Marital status -0.14% -0.16% -0.14% Logistic regression 7.61% 7.0% 7.0% Gini index -0.40% -0.38% -0.37% :??

41 efficiency Average ratio of MSEs of parameter estimates: method None EL Linear Language Marital status Logistic regression Gini index Smaller than 1: calibrated is better. : substantial efficiency gains in simple statistics minute loss of efficiency in complicated statistics

42 Relative bias of variance estimates for the descriptive statistics: None EL Linear main main+rep main main+rep -2.5% 55.4% 55.0% D & S % -10.5% Jackknife -1.8% 56.6% 35.2% 56.1% 35.1% BS (R = 100) -3.7% 53.5% -1.8% 53.2% -2.2% BS (R = 990) -2.7% 55.4% -1.1% 54.7% -1.1% MBS % 55.1% -10.4% 54.7% -10.4% : must be accounted for in variance! Without such account, the design-consistent methods report the design variance of the uncalibrated point estimator.

43 Relative bias of variance estimates None EL Linear main main+rep main main+rep Logistic regression -11.2% -11.8% -11.1% Jackknife 3.9% 3.9% 40.6% 4.6% 41.5% BS (R = 100) 13.0% 14.6% 15.5% 15.2% 16.3% BS (R = 990) 17.0% 16.7% 18.2% 17.4% 80.4% MBS % -11.0% -16.4% -10.3% -15.8% Gini index 42.4% 38.6% 37.8% Jackknife -14.1% -16.4% 14.6% -16.9% 13.3% BS (R = 100) -19.0% -20.8% -20.5% -22.0% -21.6% BS (R = 990) -18.4% -20.5% -19.7% -21.1% -20.6% MBS % -20.2% -25.3% -20.9% -26.1% : nothing works for Gini.

44 CI coverage Tail probabilities of the nominal 90% CI. None EL main main+rep Descriptive statistics Most methods % % Logistic regression 6.2% + 6.4% 6.3% +6.6% Jackknife 5.0% + 5.1% 5.0% +5.2% 2.9%+ 3.0% BS (R = 100) 4.5% + 4.5% 4.6% +4.6% 4.4%+ 4.4% BS (R = 990) 4.3% + 4.2% 4.3% +4.4% 4.3%+ 4.2% Mean BS % + 6.3% 6.2% +6.4% 6.8%+ 7.0% Gini index Most methods 3.5% % Monte Carlo margin of error: 3σ = 1.5% : notable asymmetry for fractions and Gini.

45 Limited simulation evidence suggests: s I improves precision of the descriptive statistics, not so much for the model parameter estimates The linearization and the mean are the most stable methods, followed by the jackknife and by other methods (not shown) using strata indicators does not work well with the jackknife. With on demographics only (not shown), the jackknife demonstrated performance comparable to linearization and the mean

46 s II Performance of the mean was virtually the same as that of linearization and jackknife, where applicable, or Deville & Sarndal (1992) estimator for calibrated weights: it had similar (downward) bias, stability, and coverage End users of complex survey data sets with calibrated main weights will not be able to construct adequate replicate weights There are analysis that neither variance method can handle adequately

47 What I covered was...

48 My Survey statistics Latent variable (structural equation) modeling Econometrics Multilevel modeling Statistical programming and computing

49 I Binder, D. A. (1983), On the variances of asymptotically normal estimators from complex surveys, International Statistical Review 51, Binder, D. A. & Roberts, G. R. (2009), Design- and model-based inference for model parameters, in D. Pfeffermann & C. R. Rao, eds, Sample Surveys: Inference and Analysis, Vol. 29B of Handbook of Statistics, Elsevier, Oxford, UK, chapter 24. Chen, J. & Qin, J. (1993), for finite populations and the effective usage of auxiliary information, Biometrika 80(1), Chen, J. & Sitter, R. R. (1999), A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys, Statistica Sinica 9, Chen, J., Sitter, R. R. & Wu, C. (2002), Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys, Biometrika 89(1),

50 II Cochran, W. G. (1977), Sampling Techniques, 3rd edn, John Wiley and Sons, New York. Crouse, C. & Kott, P. S. (2004), Evaluating alternative schemes for an economic survey with large nonresponse, Survey Research Methodology Section, American Statistical Association, pp Deville, J. C. & Sarndal, C. E. (1992), estimators in survey sampling, Journal of the American Statistical Association 87(418), Deville, J. C., Sarndal, C. E. & Sautory, O. (1993), Generalized raking procedures in survey sampling, Journal of the American Statistical Association 88(423), Eltinge, J. (1996), Discussion of resampling methods in sample surveys by j. shao, Statistics 27, Estevao, V. M. & Säarndal, C.-E. (2006), Survey estimates by on complex auxiliary information, International Statistical Review 74(2),

51 III Fuller, W. A. (1975), Regression analysis for sample survey, Sankhya Series C 37, Horvitz, D. G. & Thompson, D. J. (1952), A generalization of sampling without replacement from a finite universe, Journal of the American Statistical Association 47(260), Isaki, C. T., Tsay, J. H. & Fuller, W. A. (2004), Weighting sample data subject to independent controls, Survey Methodology 30(1), Kish, L. (1965), Survey Sampling, John Wiley and Sons, New York. Kish, L. & Frankel, M. R. (1974), Inference from complex samples, Journal of the Royal Statistical Society, Series B 36, 1 37., S. (2010), Resampling inference with complex survey data, Stata Journal 10, Korn, E. L. & Graubard, B. I. (1999), Analysis of Health Surveys, John Wiley and Sons.

52 IV Kott, P. S. (2006), Using weightingtoadjust for nonresponse andcoverage errors, Survey Methodology 32(2), Kott, P. S. (2009), weighting: Combining probability samples and linear prediction models, in D. Pfeffermann & C. R. Rao, eds, Sample Surveys: Inference and Analysis, Vol. 29B of Handbook of Statistics, Elsevier, Oxford, UK, chapter 25. Kovar, J. G., Rao, J. N. K. & Wu, C. F. J. (1988), Bootstrap and other methods to measure errors in survey estimates, Canadian Journal of Statistics 16, Krewski, D. & Rao, J. N. K. (1981), Inference from stratified samples: Properties of the linearization, jackknife and balanced repeated replication methods, The Annals of Statistics 9(5), McCarthy, P. J. (1969), Pseudo-replication: Half samples, Review of the International Statistical Institute 37(3),

53 V Owen, A. B. (1988), ratio confidence intervals for a single functional, Biometrika 75(2), Owen, A. B. (1991), for linear models, The Annals of Statistics 19(4), Pfeffermann, D. (1993), The role of sampling weights when modeling survey data, International Statistical Review 61, Qin, J. & Lawless, J. (1994), and general estimating equations, The Annals of Statistics 22(1), Rao, J. N. K. & Wu, C. (2009), methods, in Sample Surveys: Theory, Methods and Inference, Vol. 29 of Handbook of Statistics, Elsevier. Rao, J. N. K. & Wu, C. F. J. (1988), Resampling inference with complex survey data, Journal of the American Statistical Association 83(401),

54 VI Rao, J. N. K., Wu, C. F. J. & Yue, K. (1992), Some recent work on resampling methods for complex surveys, Survey Methodology 18(2), Ruggles, S., Alexander, J. T., Genadek, K., Goeken, R., Schroeder, M. B. & Sobek, M. (2010), Integrated Public Use Microdata Series: Version 5.0 [Machine-readable database], University of Minnesota, Minneapolis. Särndal, C.-E. (2007), The approach in survey theory and practice, Survey Methodology 33(2), Särndal, C.-E., Swensson, B. & Wretman, J. (1992), Model Assisted Survey Sampling, Springer, New York. Skinner, C. J. (1989), Domain means, regression and multivariate analysis, in C. J. Skinner, D. Holt & T. M. Smith, eds, Analysis of Complex Surveys, Wiley, New York, chapter 3, pp Théberge, A. (2000), and restricted weights, Survey Methodology 26(1),

55 VII White, H. (1980), A heteroskedasticity-consistent covariance-matrix estimator and a direct test for heteroskedasticity, Econometrica 48(4), Wolter, K. M. (2007), to Estimation, 2nd edn, Springer, New York. Wu, C. (2003), Optimal estimators in survey sampling, Biometrika 90(4), Wu, C. (2004a), Combining information from multiple surveys through the empirical likelihood method, Can J Statistics 32(1), Wu, C. (2004b), Some algorithmic aspects of the empirical likelihood method in survey sampling, Statistica Sinica 14, Yeo, D., Mantel, H. & Liu, T.-P. (1999), Bootstrap variance for the National Population Health Survey, in Proceedings of Survey Research Methods Section, The American Statistical Association, pp

56 VIII Yung, W. (1997), for public use files under confidentiality constraints, in Proceedings of Statistics Canada Symposium, Statistics Canada, pp

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Melike Oguz-Alper Yves G. Berger Abstract The data used in social, behavioural, health or biological

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction

More information

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys The Canadian Journal of Statistics Vol.??, No.?,????, Pages???-??? La revue canadienne de statistique Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys Zhiqiang TAN 1 and Changbao

More information

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Keywords: Survey sampling, finite populations, simple random sampling, systematic

More information

A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions

A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions Y.G. Berger O. De La Riva Torres Abstract We propose a new

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Calibration estimation in survey sampling

Calibration estimation in survey sampling Calibration estimation in survey sampling Jae Kwang Kim Mingue Park September 8, 2009 Abstract Calibration estimation, where the sampling weights are adjusted to make certain estimators match known population

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

Empirical Likelihood Methods

Empirical Likelihood Methods Handbook of Statistics, Volume 29 Sample Surveys: Theory, Methods and Inference Empirical Likelihood Methods J.N.K. Rao and Changbao Wu (February 14, 2008, Final Version) 1 Likelihood-based Approaches

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

The Use of Survey Weights in Regression Modelling

The Use of Survey Weights in Regression Modelling The Use of Survey Weights in Regression Modelling Chris Skinner London School of Economics and Political Science (with Jae-Kwang Kim, Iowa State University) Colorado State University, June 2013 1 Weighting

More information

Successive Difference Replication Variance Estimation in Two-Phase Sampling

Successive Difference Replication Variance Estimation in Two-Phase Sampling Successive Difference Replication Variance Estimation in Two-Phase Sampling Jean D. Opsomer Colorado State University Michael White US Census Bureau F. Jay Breidt Colorado State University Yao Li Colorado

More information

Temi di discussione. Accounting for sampling design in the SHIW. (Working papers) April by Ivan Faiella. Number

Temi di discussione. Accounting for sampling design in the SHIW. (Working papers) April by Ivan Faiella. Number Temi di discussione (Working papers) Accounting for sampling design in the SHIW by Ivan Faiella April 2008 Number 662 The purpose of the Temi di discussione series is to promote the circulation of working

More information

SAS/STAT 14.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures

SAS/STAT 14.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures SAS/STAT 14.2 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual

More information

SAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures

SAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures SAS/STAT 13.1 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete

More information

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION Statistica Sinica 8(998), 07-085 BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION Jun Shao and Yinzhong Chen University of Wisconsin-Madison Abstract: The bootstrap

More information

SAS/STAT 13.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures

SAS/STAT 13.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures SAS/STAT 13.2 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 13.2 User s Guide. The correct bibliographic citation for the complete

More information

A JACKKNIFE VARIANCE ESTIMATOR FOR SELF-WEIGHTED TWO-STAGE SAMPLES

A JACKKNIFE VARIANCE ESTIMATOR FOR SELF-WEIGHTED TWO-STAGE SAMPLES Statistica Sinica 23 (2013), 595-613 doi:http://dx.doi.org/10.5705/ss.2011.263 A JACKKNFE VARANCE ESTMATOR FOR SELF-WEGHTED TWO-STAGE SAMPLES Emilio L. Escobar and Yves G. Berger TAM and University of

More information

Estimation of change in a rotation panel design

Estimation of change in a rotation panel design Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS028) p.4520 Estimation of change in a rotation panel design Andersson, Claes Statistics Sweden S-701 89 Örebro, Sweden

More information

Introduction to Survey Data Analysis

Introduction to Survey Data Analysis Introduction to Survey Data Analysis JULY 2011 Afsaneh Yazdani Preface Learning from Data Four-step process by which we can learn from data: 1. Defining the Problem 2. Collecting the Data 3. Summarizing

More information

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators. Calibration Estimators

Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators. Calibration Estimators Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators Jill A. Dever, RTI Richard Valliant, JPSM & ISR is a trade name of Research Triangle Institute. www.rti.org

More information

New Developments in Nonresponse Adjustment Methods

New Developments in Nonresponse Adjustment Methods New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample

More information

A Design-Sensitive Approach to Fitting Regression Models With Complex Survey Data

A Design-Sensitive Approach to Fitting Regression Models With Complex Survey Data A Design-Sensitive Approach to Fitting Regression Models With Complex Survey Data Phillip S. Kott RI International Rockville, MD Introduction: What Does Fitting a Regression Model with Survey Data Mean?

More information

COMPLEX SAMPLING DESIGNS FOR THE CUSTOMER SATISFACTION INDEX ESTIMATION

COMPLEX SAMPLING DESIGNS FOR THE CUSTOMER SATISFACTION INDEX ESTIMATION STATISTICA, anno LXVII, n. 3, 2007 COMPLEX SAMPLING DESIGNS FOR THE CUSTOMER SATISFACTION INDEX ESTIMATION. INTRODUCTION Customer satisfaction (CS) is a central concept in marketing and is adopted as an

More information

Miscellanea A note on multiple imputation under complex sampling

Miscellanea A note on multiple imputation under complex sampling Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st )

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson Department of Statistics Stockholm University Introduction

More information

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING Statistica Sinica 13(2003), 641-653 EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING J. K. Kim and R. R. Sitter Hankuk University of Foreign Studies and Simon Fraser University Abstract:

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Statistica Sinica 8(1998), 1165-1173 A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Phillip S. Kott National Agricultural Statistics Service Abstract:

More information

Variance estimation for Generalized Entropy and Atkinson indices: the complex survey data case

Variance estimation for Generalized Entropy and Atkinson indices: the complex survey data case Variance estimation for Generalized Entropy and Atkinson indices: the complex survey data case Martin Biewen University of Frankfurt/Main and Stephen P. Jenkins Institute for Social & Economic Research

More information

Empirical and Constrained Empirical Bayes Variance Estimation Under A One Unit Per Stratum Sample Design

Empirical and Constrained Empirical Bayes Variance Estimation Under A One Unit Per Stratum Sample Design Empirical and Constrained Empirical Bayes Variance Estimation Under A One Unit Per Stratum Sample Design Sepideh Mosaferi Abstract A single primary sampling unit (PSU) per stratum design is a popular design

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance

More information

Simple design-efficient calibration estimators for rejective and high-entropy sampling

Simple design-efficient calibration estimators for rejective and high-entropy sampling Biometrika (202), 99,, pp. 6 C 202 Biometrika Trust Printed in Great Britain Advance Access publication on 3 July 202 Simple design-efficient calibration estimators for rejective and high-entropy sampling

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety

Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety David Buil-Gil, Reka Solymosi Centre for Criminology and Criminal

More information

Generalized pseudo empirical likelihood inferences for complex surveys

Generalized pseudo empirical likelihood inferences for complex surveys The Canadian Journal of Statistics Vol. 43, No. 1, 2015, Pages 1 17 La revue canadienne de statistique 1 Generalized pseudo empirical likelihood inferences for complex surveys Zhiqiang TAN 1 * and Changbao

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX

GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX The following document is the online appendix for the paper, Growing Apart: The Changing Firm-Size Wage

More information

F. Jay Breidt Colorado State University

F. Jay Breidt Colorado State University Model-assisted survey regression estimation with the lasso 1 F. Jay Breidt Colorado State University Opening Workshop on Computational Methods in Social Sciences SAMSI August 2013 This research was supported

More information

A BOOTSTRAP VARIANCE ESTIMATOR FOR SYSTEMATIC PPS SAMPLING

A BOOTSTRAP VARIANCE ESTIMATOR FOR SYSTEMATIC PPS SAMPLING A BOOTSTRAP VARIANCE ESTIMATOR FOR SYSTEMATIC PPS SAMPLING Steven Kaufman, National Center for Education Statistics Room 422d, 555 New Jersey Ave. N.W., Washington, D.C. 20208 Key Words: Simulation, Half-Sample

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Strati cation in Multivariate Modeling

Strati cation in Multivariate Modeling Strati cation in Multivariate Modeling Tihomir Asparouhov Muthen & Muthen Mplus Web Notes: No. 9 Version 2, December 16, 2004 1 The author is thankful to Bengt Muthen for his guidance, to Linda Muthen

More information

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting)

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) Phillip S. Kott RTI International NISS Worshop on the Analysis of Complex Survey Data With Missing Item Values October 17, 2014 1 RTI

More information

Variance estimation on SILC based indicators

Variance estimation on SILC based indicators Variance estimation on SILC based indicators Emilio Di Meglio Eurostat emilio.di-meglio@ec.europa.eu Guillaume Osier STATEC guillaume.osier@statec.etat.lu 3rd EU-LFS/EU-SILC European User Conference 1

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Estimation of Parameters and Variance

Estimation of Parameters and Variance Estimation of Parameters and Variance Dr. A.C. Kulshreshtha U.N. Statistical Institute for Asia and the Pacific (SIAP) Second RAP Regional Workshop on Building Training Resources for Improving Agricultural

More information

What is Survey Weighting? Chris Skinner University of Southampton

What is Survey Weighting? Chris Skinner University of Southampton What is Survey Weighting? Chris Skinner University of Southampton 1 Outline 1. Introduction 2. (Unresolved) Issues 3. Further reading etc. 2 Sampling 3 Representation 4 out of 8 1 out of 10 4 Weights 8/4

More information

The Effect of Multiple Weighting Steps on Variance Estimation

The Effect of Multiple Weighting Steps on Variance Estimation Journal of Official Statistics, Vol. 20, No. 1, 2004, pp. 1 18 The Effect of Multiple Weighting Steps on Variance Estimation Richard Valliant 1 Multiple weight adjustments are common in surveys to account

More information

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/2/03) Ed Stanek Here are comments on the Draft Manuscript. They are all suggestions that

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Development of methodology for the estimate of variance of annual net changes for LFS-based indicators

Development of methodology for the estimate of variance of annual net changes for LFS-based indicators Development of methodology for the estimate of variance of annual net changes for LFS-based indicators Deliverable 1 - Short document with derivation of the methodology (FINAL) Contract number: Subject:

More information

Workpackage 5 Resampling Methods for Variance Estimation. Deliverable 5.1

Workpackage 5 Resampling Methods for Variance Estimation. Deliverable 5.1 Workpackage 5 Resampling Methods for Variance Estimation Deliverable 5.1 2004 II List of contributors: Anthony C. Davison and Sylvain Sardy, EPFL. Main responsibility: Sylvain Sardy, EPFL. IST 2000 26057

More information

Task. Variance Estimation in EU-SILC Survey. Gini coefficient. Estimates of Gini Index

Task. Variance Estimation in EU-SILC Survey. Gini coefficient. Estimates of Gini Index Variance Estimation in EU-SILC Survey MārtiĦš Liberts Central Statistical Bureau of Latvia Task To estimate sampling error for Gini coefficient estimated from social sample surveys (EU-SILC) Estimation

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Heteroskedasticity-Robust Inference in Finite Samples

Heteroskedasticity-Robust Inference in Finite Samples Heteroskedasticity-Robust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticity-robust standard

More information

IEOR165 Discussion Week 5

IEOR165 Discussion Week 5 IEOR165 Discussion Week 5 Sheng Liu University of California, Berkeley Feb 19, 2016 Outline 1 1st Homework 2 Revisit Maximum A Posterior 3 Regularization IEOR165 Discussion Sheng Liu 2 About 1st Homework

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

LOGISTICS REGRESSION FOR SAMPLE SURVEYS 4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

More information

Complex sample design effects and inference for mental health survey data

Complex sample design effects and inference for mental health survey data International Journal of Methods in Psychiatric Research, Volume 7, Number 1 Complex sample design effects and inference for mental health survey data STEVEN G. HEERINGA, Division of Surveys and Technologies,

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

The Effective Use of Complete Auxiliary Information From Survey Data

The Effective Use of Complete Auxiliary Information From Survey Data The Effective Use of Complete Auxiliary Information From Survey Data by Changbao Wu B.S., Anhui Laodong University, China, 1982 M.S. Diploma, East China Normal University, 1986 a thesis submitted in partial

More information

A comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group

A comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group A comparison of weighted estimators for the population mean Ye Yang Weighting in surveys group Motivation Survey sample in which auxiliary variables are known for the population and an outcome variable

More information

Design and Estimation for Split Questionnaire Surveys

Design and Estimation for Split Questionnaire Surveys University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2008 Design and Estimation for Split Questionnaire

More information

Inference in VARs with Conditional Heteroskedasticity of Unknown Form

Inference in VARs with Conditional Heteroskedasticity of Unknown Form Inference in VARs with Conditional Heteroskedasticity of Unknown Form Ralf Brüggemann a Carsten Jentsch b Carsten Trenkler c University of Konstanz University of Mannheim University of Mannheim IAB Nuremberg

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?

More information

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total. Abstract

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total.   Abstract NONLINEAR CALIBRATION 1 Alesandras Pliusas 1 Statistics Lithuania, Institute of Mathematics and Informatics, Lithuania e-mail: Pliusas@tl.mii.lt Abstract The definition of a calibrated estimator of the

More information

From the help desk: It s all about the sampling

From the help desk: It s all about the sampling The Stata Journal (2002) 2, Number 2, pp. 90 20 From the help desk: It s all about the sampling Allen McDowell Stata Corporation amcdowell@stata.com Jeff Pitblado Stata Corporation jsp@stata.com Abstract.

More information

A better way to bootstrap pairs

A better way to bootstrap pairs A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression

More information

BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition, International Publication,

BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition, International Publication, STATISTICS IN TRANSITION-new series, August 2011 223 STATISTICS IN TRANSITION-new series, August 2011 Vol. 12, No. 1, pp. 223 230 BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition,

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement

More information

QUASI-SCORE TESTS WITH SURVEY DATA

QUASI-SCORE TESTS WITH SURVEY DATA Statistica Sinica 8(1998), 1059-1070 QUASI-SCORE TESTS WITH SURVEY DATA J. N. K. Rao, A. J. Scott and C. J. Skinner Carleton University, University of Auckland and University of Southampton Abstract: Although

More information

Gravity Models, PPML Estimation and the Bias of the Robust Standard Errors

Gravity Models, PPML Estimation and the Bias of the Robust Standard Errors Gravity Models, PPML Estimation and the Bias of the Robust Standard Errors Michael Pfaffermayr August 23, 2018 Abstract In gravity models with exporter and importer dummies the robust standard errors of

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

ICSA Applied Statistics Symposium 1. Balanced adjusted empirical likelihood

ICSA Applied Statistics Symposium 1. Balanced adjusted empirical likelihood ICSA Applied Statistics Symposium 1 Balanced adjusted empirical likelihood Art B. Owen Stanford University Sarah Emerson Oregon State University ICSA Applied Statistics Symposium 2 Empirical likelihood

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators Stefano Marchetti 1 Nikos Tzavidis 2 Monica Pratesi 3 1,3 Department

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,

More information

The comparative studies on reliability for Rayleigh models

The comparative studies on reliability for Rayleigh models Journal of the Korean Data & Information Science Society 018, 9, 533 545 http://dx.doi.org/10.7465/jkdi.018.9..533 한국데이터정보과학회지 The comparative studies on reliability for Rayleigh models Ji Eun Oh 1 Joong

More information

Taking into account sampling design in DAD. Population SAMPLING DESIGN AND DAD

Taking into account sampling design in DAD. Population SAMPLING DESIGN AND DAD Taking into account sampling design in DAD SAMPLING DESIGN AND DAD With version 4.2 and higher of DAD, the Sampling Design (SD) of the database can be specified in order to calculate the correct asymptotic

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb May 15, 2015 > Abstract In this paper, we utilize a new graphical

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information