Combining data from two independent surveys: model-assisted approach
|
|
- Evelyn Shelton
- 6 years ago
- Views:
Transcription
1 Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, Joint work with J.N.K. Rao, Carleton University
2 Reference Kim, J.K. and Rao, J.N.K. (2012). Combining data from two independent surveys: a model-assisted approach, Biometrika, In Press. (Available online via Advance Access /biomet/asr063.)
3 3 Outline 1 Introduction 2 Projection estimation 3 Replication variance estimation 4 Efficient estimation: Full information 5 Simulation study 6 Concluding remarks & Discussion
4 1. Introduction Two-phase sampling (Classical) Two-phase sampling A 1 : first-phase sample of size n 1 A 2 : second-phase sample of size n 2 (A 2 A 1 ) x observed in phase 1 and both y and x observed in phase 2. Assume that 1 is an element of x i. Neyman (1934), Hansen & Hurwitz (1946), Rao (1973), Kott & Stukel (1997), Binder et al. (2000), Kim et al. (2006), Hidiroglou et al. (2009).
5 5 1. Introduction Two-phase sampling GREG estimator of Y = N i=1 y i: Ŷ G = ˆX ˆβ 1 2 ˆX 1 = w 1i x i, ˆβ 2 = w 2i x i x i i A1 i A2 1 Two ways of implementing the GREG estimator Calibration: create data file for A 2 ( Ŷ G = i A 2 w 2G,i y i, w 2G,i = ˆX 1 i A 2 w 2i x i x i Projection estimation: create data file for A 1. Ŷ G = i A 1 w 1i ỹ i, ỹ i = x i ˆβ 2 i A 2 w 2i x i y i ) 1 w 2ix i
6 6 1. Introduction Domain projection estimators Calibration estimator of domain total Y d = N i=1 δ i(d)y i : Ŷ Cal,d = i A 2 w 2G,i δ i (d)y i δ i (d) = 1 if i belongs to domain d, δ i (d) = 0 otherwise. Note: Ŷ Cal,d is based only on the domain sample belonging to A 2 and it could lead to large variance if domain A 2 sample is very small.
7 1. Introduction Domain projection estimators Domain projection estimator (Fuller, 2003) Ŷ p,d = i A 1 w i1 δ i (d)ỹ i Note: Ŷ p,d is based on much larger domain sample belonging to A 1 than Ŷ Cal,d based on domain sample belonging to A 2. Hence, Ŷp,d could be significantly more efficient if its relative bias is small. Under the model y i = x i β + e i with E(e i ) = 0, Ŷ p,d is model unbiased for Y d. But, it is possible to construct populations for which Ŷ p,d is very design biased (Fuller, 2003).
8 8 1. Introduction Combining two independent surveys Large sample A 1 collecting only x, and weights {w i1, i A 1 }. Much smaller sample A 2 collecting x and y drawn independently and weights {w i2, i A 2 }. Example 1 (Hidiroglou, 2001): Canadian Survey of Employment, Payrolls and Hours A 1 : Large sample drawn from a Canadian Customs and Revenue Agency administrative data file and auxiliary variables x observed. A 2 : Small sample from Statistics Canada Business Register and study variables y, number of hours worked by employees and summarized earnings, observed.
9 9 1. Introduction Combining two independent surveys Example 2 (Reiter, 2008) A 2 : Both self-reported health measurements, x, and clinical measurements from physical examinations, y, observed A 1 : Only x reported Synthetic values ỹ i, i A 1 are created by first fitting a working model E(y) = m(x, β) relating y to x to data {(y i, x i ), i A 2 } and then predicting y i associated with x i, i A 1. Only synthetic values ỹ i = m(x i, ˆβ), i A 1 and associated weights w i1, i A 1 are released to the public. Our focus is on producing estimators of totals and domain totals from the synthetic data file {(ỹ i, w i1 ), i A 1 }.
10 10 2. Projection estimation Estimation of Y Projection estimator of Y : Ŷ p = i A 1 w i1 ỹ i Ŷ p is asymptotically design-unbiased if ˆβ satisfies { ( )} y i m x i, ˆβ = 0 ( ) i A 2 w i2 Note: Under condition (*), Ŷ p = i A 1 w i1 ỹ i + i A 2 w i2 {y i ỹ i } = prediction + bias correction
11 2. Projection estimation Estimation of Y Theorem 1: Under some regularity conditions, if ˆβ satisfies condition (*), we can write Ŷ p = w i1 m 0 (x i ) + w i2 {y i m 0 (x i )} = ˆP 1 + ˆQ 2 i A 1 i A 2 where m 0 (x i ) = m(x i, β 0 ) and β 0 = p lim ˆβ with respect to survey 2. Thus, and E(Ŷ p ) = N N m 0 (x i ) + {y i m 0 (x i )} = i=1 i=1 V (Ŷ p ) = V (ˆP 1 ) + V ( ˆQ 2 ). N y i. i=1
12 12 2. Projection estimation Model-assisted approach: Asymptotic unbiasedness of Ŷ p does not depend on the validity of the working model but efficiency is affected. Note: In the variance decomposition V (Ŷp) = V (ˆP 1 ) + V ( ˆQ 2 ) = V 1 + V 2. V 1 is based on n 1 sample elements and V 2 is based on n 2 sample elements. If n 2 << n 1, then V 1 << V 2. If the working model is good, then the squared error terms ei 2 = {y i m 0 (x i )} 2 are small and V 2 will also be small.
13 2. Projection Estimation When is condition (*) satisfied? If 1 is an element of x i, this condition is satisfied for linear regression m(x i, β) = x iβ and logistic regression logit{m(x i, β)} = x i β when ˆβ is obtained from the estimating equation w i2 x i (y i m i ) = 0 i A 2 for linear and logistic regression working models. For the ratio model, ˆβ is the solution of i A 2 w i2 (y i m i ) = 0.
14 14 2. Projection Estimation Linearization variance estimation Let e i = y i ỹ i, then the variance estimator of Ŷ p is v L (Ŷ p ) = v 1 (ỹ i ) + v 2 (ê i ) v 1 ( z i ) = v(ẑ 1 ) = variance estimator for survey 1 v 2 ( z i ) = v(ẑ 2 ) = variance estimator for survey 2 Ẑ 1 = i A 1 w i1 z i, Ẑ 2 = i A 2 w i2 z i. Note v L (Ŷ p ) requires access to data from both surveys.
15 15 2. Projection Estimation Estimation of domain total Y d Projection domain estimator Ŷ d,p = i A 1 w i1 δ i (d)ỹ i Ŷ d,p is asymptotically unbiased if Case (i) : w i2 δ i (d)(y i ỹ i ) = 0 i A 2 OR Case (ii) : Cov {δ i (d), y i m(x i, β 0 )} = 0.
16 16 2. Projection Estimation Estimation of domain total Y d Case (i): For linear or logistic regression models (i) is satisfied if δ i (d) is an elements of x i. For planned domains specified in advance, augmented working models can be used. Survey 1 data file should provide planned domain indicators. Case (ii): If working model is good, then the relative bias of Ŷ d,p would be small. Ŷ d,p is asymptotically model unbiased if model is correct. Ŷ d,p can be significantly design biased for some populations.
17 3. Replication variance estimation Replication variance estimation for Ŷ p Replication variance estimator for survey 1: L 1 ) (k) 2 v 1,rep (Ẑ) = c k (Ẑ 1 Ẑ 1 k=1 Ẑ (k) 1 = i A 1 w (k) i1 z i and {w (k) i1, i A 1}, k = 1,, L 1 : replication weights for survey 1 Replication variance estimator for Ŷ p : where Ŷ p (k) = i A 1 w (k) i1 values for replicate k. L 1 (k) v 1,rep (Ŷp) = c k (Ŷ p k=1 ỹ (k) i Ŷp ) 2 and {ỹ (k) i, i A 1 } are synthetic
18 18 3. Replication variance estimation Replication variance estimation for Ŷ p How to create replicated synthetic data {ỹ (k) i, i A 1 }? 1 Create {w (k) i2, k = 1,, L 1; i A 2 } such that L 1 k=1 c k (Ŷ (k) 2 Ŷ 2 ) 2 = v2 (Ŷ 2 ) 2 Compute ˆβ (k) and ỹ (k) i = m(x i, ˆβ (k) ) by solving w (k) i2 {y i m(x i, β)}x i = 0 i A 2 for ˆβ (k) (linear or logistic linear regression) v 1,rep (Ŷ p ) is asymptotically unbiased. Data file for sample A 1 should contain additional columns of {ỹ (k) i, i A 1 } and associated {w (k) i1, i A 1}, k = 1, 2,, L 1.
19 19 3. Replication variance estimation Replication variance estimation for Ŷ d,p Let Ŷ (k) d,p = i A 1 w (k) i1 δ i(d)ỹ (k) i, then L 1 ) (k) 2 v 1,rep (Ŷd,p) = c k (Ŷ d,p Ŷd,p k=1 Asymptotically unbiased under either case (i) or case (ii).
20 4. Optimal estimator: Full information Estimation of total Y Three estimators for two parameters Survey 1: ˆX1 for X Survey 2: ( ˆX 2, Ŷ2) for (X, Y ) Combine information using generalized least squares minimize Q(X, Y ) = ˆX 1 X ˆX 2 X Ŷ 2 Y V 1 ˆX 1 X ˆX 2 X Ŷ 2 Y with respect to (X, Y ) where V is the variance-covariance matrix of ( ˆX 1, ˆX 2, Ŷ 2 ).
21 21 4. Optimal estimator: Full information Estimation of total Y Best linear unbiased estimator based on ˆX 2, Ŷ2 and ˆX 1 : Ỹ opt = Ŷ2 + B y x2 ( Xopt ˆX 2 ) X opt = V xx2 ˆX 1 + V xx1 ˆX 2 V xx1 + V xx2 where B y x2 = V yx2 /V xx2, V xx1 = V ( ˆX 1 ), V xx2 = V ( ˆX 2 ), V yx2 = Cov(Ŷ2, ˆX 2 ). Replace variances in Ỹopt by estimated variances to get Ŷopt and ˆX opt.
22 22 4. Optimal estimator: Full information Estimation of total Y Ŷ opt can be expressed as Ŷ opt = i A 2 w i2y i {wi2, i A 2} are calibration weights: i A 2 wi2 x i = ˆX opt. Ŷ opt can be computed from data file for A 2 providing weights {wi2, i A 2} Example: Simple random samples A 1 and A 2 w i2 = N n 2 + x 2 : mean of x for A 2 ( ) x i x 2 ˆX opt ˆX 2 i A 2 (x i x 2 ) 2
23 4. Optimal estimator: Full information Domain estimation Calibration estimator: Ŷ d = i A 2 w i2δ i (d)y i computed from data file for A 2 only. Projection estimator: Ŷ p,d = i A 1 w i1 δ i (d)ỹ i computed from data file for A 1. Both Ŷd and Ŷ d,p satisfy internal consistency property: Ŷd = Ŷ opt, Ŷ d,p = Ŷ p d d
24 4. Optimal estimator: Full information Domain estimation Ŷd is asymptotically design unbiased but can lead to a large variance if domain contains few sample A 2 units. Optimal estimator Ŷ opt,d based on domain specific variances does not satisfy internal consistency, may not be stable for small domain sample size and it cannot be implemented from A 2 data file.
25 5. Simulation Study Simulation Setup Two artificial populations A and B of size N = 10, 000: {(y i, x i, z i ); i = 1,, N} Population A: Regression model Population B: Ratio model x i χ 2 (2), y i = x i + e i e i N(0, 2), z i Unif(0, 1) z i independent of (x i, y i ) same (x i, z i ) but y i = 0.7x i + u i u i N(0, x i ) cov(y, x) = 0.71 for both populations Domain d: δ i (d) = 1 if z i < 0.3; δ i (d) = 0 otherwise.
26 26 5. Simulation Study Simulation Setup Two independent simple random samples: n 1 = 500, n 2 = 100 Working models: linear regression, ratio, augmented linear regression, augmented ratio Relative bias: RB(Ŷ ) = {E(Ŷ ) Y }/Y Relative efficiency: RE(Ŷ ) = mse(ŷopt)/mse(ŷ )
27 27 5. Simulation Study Simulation Results Table 1: Simulation Results (Point estimation) Parameter Estimator Population A Population B RB RE RB RE Total Regression projection Ratio projection Aug. Reg. projection Aug. Rat. projection Optimal Domain Regression projection Ratio projection Aug. Reg. projection Aug. Rat. projection Optimal Calibration
28 28 5. Simulation Study Conclusions from Table 1 Estimation of total Y 1 RB of all estimator negligible: less than 2% 2 Regression projection estimator almost as efficient as Ŷopt even when the true model is ratio model. Ratio projection estimator is considerably less efficient if the true model has substantial intercept term: model diagnostics to identify good working model 3 Augmented projection estimators similar to corresponding projection estimators in terms of RB and RE.
29 29 5. Simulation Study Conclusions from Table 1 Domain estimation 1 RB of all estimators less than 5%: simulation setup ensures δ i (d) unrelated to r i = y i m(x i ; β). 2 Regression projection estimator considerably more efficient than the calibration estimator or optimal estimator: projection estimator based on larger sample size 3 Ratio projection estimator considerably less efficient if the model has substantial intercept term.
30 30 5. Simulation Study Jackknife variance estimation L 1 = n 2 = 100 pseudo replicates by random group jackknife Table 2: Simulation Results (relative biases of var. est.) Point Estimator Parameter Pop. A Pop. B Regression Projection Total Domain Ratio Projection Total Domain Aug. Reg. Projection Total Domain Aug. Rat. Projection Total Domain RB of jackknife variance estimators small: less than 5%
31 6. Discussion Some alternative approaches The proposed method does not lead to the optimal estimator: Ŷ opt = Ŷ 2 + ˆB y x2 ( X opt ˆX 2 ) X opt = V xx2 ˆX 1 + V xx1 ˆX 2 V xx1 + V xx2 To implement the optimal estimator using synthetic data, we may express Ŷ opt = w i3 ỹ i + w i2 (y i ỹ i ) i A 2 i A 1 where ỹ i = x i ˆB y x2, A 1 = A 1 A 2 and w i3 is the sampling weight for A 1 satisfying i A 1 w i3 x i = X opt 31
32 6. Discussion Some alternative approaches If i A 2 w i2 = i A w i3, then we can further express 1 Ŷ opt = w i3 w ij (ŷ i + ê j ) j A 2 i A 1 where w ij = w j2 /( i A 2 w i2 ) and ê j = y j ŷ j. It now take the form of fractional imputation considered in Fuller & Kim (2005). To reduce the size of the data set, we may consider random selection of M residuals to get êj and Ŷ FI = M i A 1 j=1 where wij satisfies M j=1 w ij w i3 wij (ŷi + êj ), ( 1, ê j 32 ) = j A 2 w ij (1, ê j).
33 33 6. Discussion Some alternative approaches Nested two-phase sampling: A 2 A 1 Non-nested two-phase sampling : A 1, A 2 independent We can convert non-nested two-phase sampling into a nested two-phase sampling A 2 A 1 where A 1 = A 1 A 2 Synthetic data can be released for A 1
34 34 6. Discussion Parametric multiple imputation Assume that f (y i x i, θ) is known for fixed θ and that A 1 and A 2 are simple random samples Obtain the posterior distribution of θ: p(θ y 2, x 2 ) assuming a diffuse prior on θ, where (y 2, x 2 )= data from A 2 Draw M values θ (1),, θ (M) from the posterior distribution. Draw y (l) i from f (y i x i, θ (l) ) for i A 1 and l = 1,, M. Synthetic data sets: {y (l) i, i A 1 }, l = 1,, M. Standard multiple imputation variance estimators do not work here. Reiter (2008) proposed a two-stage imputation procedure requiring T synthetic data sets {y (l) it : i A 1, t = 1,, T } for each θ (l) to be generated. In all, TM synthetic data sets are generated.
35 35 6. Discussion Conclusion The proposed method is based on determination imputation to generate synthetic values. Synthetic data along with the replicates are created for survey 1 and only survey 1 data is released. Significant efficiency gain is achieved for domain estimation. Stochastic imputation approach is under study.
36 REFERENCES Binder, D.A. anad Babyak, C., Brodeur, M., Hidiroglou, M., & Jocelyn, W. (2000). Variance estimation for two-phase stratified sampling. Can. J. Statist. 28, Fuller, W. A. (2003). Estimation for multiple phase samples. In Analysis of Survey Data, R. L. Chambers & C. J. Skinner, eds. Wiley: Chichester, England. Fuller, W. A. & Kim, J.-K. (2005). Hot deck imputation for the response model. Survey Methodology 31, Hansen, M. & Hurwitz, W. (1946). The problem of non-response in sample surveys. J. Am. Statist. Assoc. 41, Hidiroglou, M. (2001). Double sampling. Survey Methodol. 27,
37 Hidiroglou, M. A., Rao, J. N. K. & Haziza, D. (2009). Variance estimation in two-phase sampling. Australian and New Zealand Journal of Statistics 51, Kim, J. K., Navarro, A. & Fuller, W. A. (2006). Replicate variance estimation after multi-phase stratified sampling. J. Am. Statist. Assoc. 101, Kott, P. & Stukel, D. (1997). Can the jackknife be used with a two-phase sample? Survey Methodology 23, Neyman, J. (1934). On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society 97, Rao, J. N. K. (1973). On double sampling for stratification and analytical surveys. Biometrika 60, Reiter, J. (2008). Multiple imputation when records used for imputation are not used or disseminated for analysis. Biometrika 95,
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationINSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING
Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy
More informationChapter 8: Estimation 1
Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationChapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70
Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:
More informationCombining Non-probability and Probability Survey Samples Through Mass Imputation
Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu
More informationA measurement error model approach to small area estimation
A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion
More informationREPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY
REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in
More informationTwo-phase sampling approach to fractional hot deck imputation
Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationWeighting in survey analysis under informative sampling
Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting
More informationImputation for Missing Data under PPSWR Sampling
July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR
More informationEFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING
Statistica Sinica 13(2003), 641-653 EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING J. K. Kim and R. R. Sitter Hankuk University of Foreign Studies and Simon Fraser University Abstract:
More informationData Integration for Big Data Analysis for finite population inference
for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation
More informationOn the bias of the multiple-imputation variance estimator in survey sampling
J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,
More informationIntroduction to Survey Data Integration
Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5
More informationThe Use of Survey Weights in Regression Modelling
The Use of Survey Weights in Regression Modelling Chris Skinner London School of Economics and Political Science (with Jae-Kwang Kim, Iowa State University) Colorado State University, June 2013 1 Weighting
More informationNonresponse weighting adjustment using estimated response probability
Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationCalibration estimation using exponential tilting in sample surveys
Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary
More information6. Fractional Imputation in Survey Sampling
6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More informationEmpirical Likelihood Methods for Sample Survey Data: An Overview
AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use
More informationMiscellanea A note on multiple imputation under complex sampling
Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.
More informationApplied Econometrics (QEM)
Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple
More informationREPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES
Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for
More informationRecent Advances in the analysis of missing data with non-ignorable missingness
Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation
More informationAn Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys
An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction
More informationChapter 4: Imputation
Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation
More informationChapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling
Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st )
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationin Survey Sampling Petr Novák, Václav Kosina Czech Statistical Office Using the Superpopulation Model for Imputations and Variance
Using the Superpopulation Model for Imputations and Variance Computation in Survey Sampling Czech Statistical Office Introduction Situation Let us have a population of N units: n sampled (sam) and N-n
More informationChapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28
Chapter 4 Replication Variance Estimation J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Jackknife Variance Estimation Create a new sample by deleting one observation n 1 n n ( x (k) x) 2 = x (k) = n
More informationAccounting for Complex Sample Designs via Mixture Models
Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3
More informationarxiv: v2 [math.st] 20 Jun 2014
A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun
More informationCalibration estimation in survey sampling
Calibration estimation in survey sampling Jae Kwang Kim Mingue Park September 8, 2009 Abstract Calibration estimation, where the sampling weights are adjusted to make certain estimators match known population
More informationGraybill Conference Poster Session Introductions
Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary
More informationarxiv:math/ v1 [math.st] 23 Jun 2004
The Annals of Statistics 2004, Vol. 32, No. 2, 766 783 DOI: 10.1214/009053604000000175 c Institute of Mathematical Statistics, 2004 arxiv:math/0406453v1 [math.st] 23 Jun 2004 FINITE SAMPLE PROPERTIES OF
More informationEmpirical likelihood inference for regression parameters when modelling hierarchical complex survey data
Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Melike Oguz-Alper Yves G. Berger Abstract The data used in social, behavioural, health or biological
More informationModel Assisted Survey Sampling
Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling
More informationSampling techniques for big data analysis in finite population inference
Statistics Preprints Statistics 1-29-2018 Sampling techniques for big data analysis in finite population inference Jae Kwang Kim Iowa State University, jkim@iastate.edu Zhonglei Wang Iowa State University,
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationVARIANCE ESTIMATION FOR TWO-PHASE STRATIFIED SAMPLING
VARIACE ESTIMATIO FOR TWO-PHASE STRATIFIED SAMPLIG David A. Binder, Colin Babyak, Marie Brodeur, Michel Hidiroglou Wisner Jocelyn (Statistics Canada) Business Survey Methods Division, Statistics Canada,
More informationANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW
SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved
More informationVARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA
Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University
More informationFractional hot deck imputation
Biometrika (2004), 91, 3, pp. 559 578 2004 Biometrika Trust Printed in Great Britain Fractional hot deck imputation BY JAE KWANG KM Department of Applied Statistics, Yonsei University, Seoul, 120-749,
More informationNon-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators
Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators Stefano Marchetti 1 Nikos Tzavidis 2 Monica Pratesi 3 1,3 Department
More informationBIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING
Statistica Sinica 22 (2012), 777-794 doi:http://dx.doi.org/10.5705/ss.2010.238 BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Desislava Nedyalova and Yves Tillé University of
More informationFinite Population Sampling and Inference
Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane
More informationA MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR
Statistica Sinica 8(1998), 1165-1173 A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Phillip S. Kott National Agricultural Statistics Service Abstract:
More informationA Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions
A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions Y.G. Berger O. De La Riva Torres Abstract We propose a new
More informationCombining Non-probability and. Probability Survey Samples Through Mass Imputation
Combining Non-probability and arxiv:1812.10694v2 [stat.me] 31 Dec 2018 Probability Survey Samples Through Mass Imputation Jae Kwang Kim Seho Park Yilin Chen Changbao Wu January 1, 2019 Abstract. This paper
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationIntermediate Econometrics
Intermediate Econometrics Markus Haas LMU München Summer term 2011 15. Mai 2011 The Simple Linear Regression Model Considering variables x and y in a specific population (e.g., years of education and wage
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationAsymptotic Normality under Two-Phase Sampling Designs
Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context
More informationRESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy.
CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2014 www.csgb.dk RESEARCH REPORT Ina Trolle Andersen, Ute Hahn and Eva B. Vedel Jensen Vanishing auxiliary variables in PPS sampling with applications
More informationWhat is Survey Weighting? Chris Skinner University of Southampton
What is Survey Weighting? Chris Skinner University of Southampton 1 Outline 1. Introduction 2. (Unresolved) Issues 3. Further reading etc. 2 Sampling 3 Representation 4 out of 8 1 out of 10 4 Weights 8/4
More informationASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS
Statistica Sinica 17(2007), 1047-1064 ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Jiahua Chen and J. N. K. Rao University of British Columbia and Carleton University Abstract: Large sample properties
More informationThe regression model with one fixed regressor cont d
The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationImprovement in Estimating the Finite Population Mean Under Maximum and Minimum Values in Double Sampling Scheme
J. Stat. Appl. Pro. Lett. 2, No. 2, 115-121 (2015) 115 Journal of Statistics Applications & Probability Letters An International Journal http://dx.doi.org/10.12785/jsapl/020203 Improvement in Estimating
More informationEcon 2120: Section 2
Econ 2120: Section 2 Part I - Linear Predictor Loose Ends Ashesh Rambachan Fall 2018 Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted
More informationSimple Linear Regression Analysis
LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study
More informationNew estimation methodology for the Norwegian Labour Force Survey
Notater Documents 2018/16 Melike Oguz-Alper New estimation methodology for the Norwegian Labour Force Survey Documents 2018/16 Melike Oguz Alper New estimation methodology for the Norwegian Labour Force
More informationLECTURE 2 LINEAR REGRESSION MODEL AND OLS
SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another
More informationAdmissible Estimation of a Finite Population Total under PPS Sampling
Research Journal of Mathematical and Statistical Sciences E-ISSN 2320-6047 Admissible Estimation of a Finite Population Total under PPS Sampling Abstract P.A. Patel 1* and Shradha Bhatt 2 1 Department
More informationNonstationary Panels
Nonstationary Panels Based on chapters 12.4, 12.5, and 12.6 of Baltagi, B. (2005): Econometric Analysis of Panel Data, 3rd edition. Chichester, John Wiley & Sons. June 3, 2009 Agenda 1 Spurious Regressions
More informationPlausible Values for Latent Variables Using Mplus
Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can
More informationPropensity score adjusted method for missing data
Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd
More informationIn Praise of the Listwise-Deletion Method (Perhaps with Reweighting)
In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) Phillip S. Kott RTI International NISS Worshop on the Analysis of Complex Survey Data With Missing Item Values October 17, 2014 1 RTI
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationof being selected and varying such probability across strata under optimal allocation leads to increased accuracy.
5 Sampling with Unequal Probabilities Simple random sampling and systematic sampling are schemes where every unit in the population has the same chance of being selected We will now consider unequal probability
More informationSmall Domain Estimation for a Brazilian Service Sector Survey
Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS003) p.334 Small Domain Estimation for a Brazilian Service Sector Survey André Neves 1, Denise Silva and Solange Correa
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationA comparison of stratified simple random sampling and sampling with probability proportional to size
A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson Department of Statistics Stockholm University Introduction
More informationStatistics 910, #5 1. Regression Methods
Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known
More informationSampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.
Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Keywords: Survey sampling, finite populations, simple random sampling, systematic
More informationSlides 12: Output Analysis for a Single Model
Slides 12: Output Analysis for a Single Model Objective: Estimate system performance via simulation. If θ is the system performance, the precision of the estimator ˆθ can be measured by: The standard error
More informationTWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION
Statistica Sinica 13(2003), 613-623 TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Hansheng Wang and Jun Shao Peking University and University of Wisconsin Abstract: We consider the estimation
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationSTATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS
STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS David A Binder and Georgia R Roberts Methodology Branch, Statistics Canada, Ottawa, ON, Canada K1A 0T6 KEY WORDS: Design-based properties, Informative sampling,
More informationSimple Linear Regression
Simple Linear Regression Christopher Ting Christopher Ting : christophert@smu.edu.sg : 688 0364 : LKCSB 5036 January 7, 017 Web Site: http://www.mysmu.edu/faculty/christophert/ Christopher Ting QF 30 Week
More informationarxiv: v1 [stat.me] 3 Nov 2015
The Unbiasedness Approach to Linear Regression Models arxiv:5.0096v [stat.me] 3 Nov 205 P. Vellaisamy Department of Mathematics, Indian Institute of Technology Bombay, Powai, Mumbai-400076, India. Abstract
More informationSuccessive Difference Replication Variance Estimation in Two-Phase Sampling
Successive Difference Replication Variance Estimation in Two-Phase Sampling Jean D. Opsomer Colorado State University Michael White US Census Bureau F. Jay Breidt Colorado State University Yao Li Colorado
More informationA Short Course in Basic Statistics
A Short Course in Basic Statistics Ian Schindler November 5, 2017 Creative commons license share and share alike BY: C 1 Descriptive Statistics 1.1 Presenting statistical data Definition 1 A statistical
More informationREGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University
REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.
More informationECON The Simple Regression Model
ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In
More informationSimple design-efficient calibration estimators for rejective and high-entropy sampling
Biometrika (202), 99,, pp. 6 C 202 Biometrika Trust Printed in Great Britain Advance Access publication on 3 July 202 Simple design-efficient calibration estimators for rejective and high-entropy sampling
More informationLinear regression with nested errors using probability-linked data
University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2014 Linear regression with nested errors using
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationNonparametric Small Area Estimation via M-quantile Regression using Penalized Splines
Nonparametric Small Estimation via M-quantile Regression using Penalized Splines Monica Pratesi 10 August 2008 Abstract The demand of reliable statistics for small areas, when only reduced sizes of the
More informationMaking sense of Econometrics: Basics
Making sense of Econometrics: Basics Lecture 2: Simple Regression Egypt Scholars Economic Society Happy Eid Eid present! enter classroom at http://b.socrative.com/login/student/ room name c28efb78 Outline
More information