Combining data from two independent surveys: model-assisted approach

Size: px
Start display at page:

Download "Combining data from two independent surveys: model-assisted approach"

Transcription

1 Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, Joint work with J.N.K. Rao, Carleton University

2 Reference Kim, J.K. and Rao, J.N.K. (2012). Combining data from two independent surveys: a model-assisted approach, Biometrika, In Press. (Available online via Advance Access /biomet/asr063.)

3 3 Outline 1 Introduction 2 Projection estimation 3 Replication variance estimation 4 Efficient estimation: Full information 5 Simulation study 6 Concluding remarks & Discussion

4 1. Introduction Two-phase sampling (Classical) Two-phase sampling A 1 : first-phase sample of size n 1 A 2 : second-phase sample of size n 2 (A 2 A 1 ) x observed in phase 1 and both y and x observed in phase 2. Assume that 1 is an element of x i. Neyman (1934), Hansen & Hurwitz (1946), Rao (1973), Kott & Stukel (1997), Binder et al. (2000), Kim et al. (2006), Hidiroglou et al. (2009).

5 5 1. Introduction Two-phase sampling GREG estimator of Y = N i=1 y i: Ŷ G = ˆX ˆβ 1 2 ˆX 1 = w 1i x i, ˆβ 2 = w 2i x i x i i A1 i A2 1 Two ways of implementing the GREG estimator Calibration: create data file for A 2 ( Ŷ G = i A 2 w 2G,i y i, w 2G,i = ˆX 1 i A 2 w 2i x i x i Projection estimation: create data file for A 1. Ŷ G = i A 1 w 1i ỹ i, ỹ i = x i ˆβ 2 i A 2 w 2i x i y i ) 1 w 2ix i

6 6 1. Introduction Domain projection estimators Calibration estimator of domain total Y d = N i=1 δ i(d)y i : Ŷ Cal,d = i A 2 w 2G,i δ i (d)y i δ i (d) = 1 if i belongs to domain d, δ i (d) = 0 otherwise. Note: Ŷ Cal,d is based only on the domain sample belonging to A 2 and it could lead to large variance if domain A 2 sample is very small.

7 1. Introduction Domain projection estimators Domain projection estimator (Fuller, 2003) Ŷ p,d = i A 1 w i1 δ i (d)ỹ i Note: Ŷ p,d is based on much larger domain sample belonging to A 1 than Ŷ Cal,d based on domain sample belonging to A 2. Hence, Ŷp,d could be significantly more efficient if its relative bias is small. Under the model y i = x i β + e i with E(e i ) = 0, Ŷ p,d is model unbiased for Y d. But, it is possible to construct populations for which Ŷ p,d is very design biased (Fuller, 2003).

8 8 1. Introduction Combining two independent surveys Large sample A 1 collecting only x, and weights {w i1, i A 1 }. Much smaller sample A 2 collecting x and y drawn independently and weights {w i2, i A 2 }. Example 1 (Hidiroglou, 2001): Canadian Survey of Employment, Payrolls and Hours A 1 : Large sample drawn from a Canadian Customs and Revenue Agency administrative data file and auxiliary variables x observed. A 2 : Small sample from Statistics Canada Business Register and study variables y, number of hours worked by employees and summarized earnings, observed.

9 9 1. Introduction Combining two independent surveys Example 2 (Reiter, 2008) A 2 : Both self-reported health measurements, x, and clinical measurements from physical examinations, y, observed A 1 : Only x reported Synthetic values ỹ i, i A 1 are created by first fitting a working model E(y) = m(x, β) relating y to x to data {(y i, x i ), i A 2 } and then predicting y i associated with x i, i A 1. Only synthetic values ỹ i = m(x i, ˆβ), i A 1 and associated weights w i1, i A 1 are released to the public. Our focus is on producing estimators of totals and domain totals from the synthetic data file {(ỹ i, w i1 ), i A 1 }.

10 10 2. Projection estimation Estimation of Y Projection estimator of Y : Ŷ p = i A 1 w i1 ỹ i Ŷ p is asymptotically design-unbiased if ˆβ satisfies { ( )} y i m x i, ˆβ = 0 ( ) i A 2 w i2 Note: Under condition (*), Ŷ p = i A 1 w i1 ỹ i + i A 2 w i2 {y i ỹ i } = prediction + bias correction

11 2. Projection estimation Estimation of Y Theorem 1: Under some regularity conditions, if ˆβ satisfies condition (*), we can write Ŷ p = w i1 m 0 (x i ) + w i2 {y i m 0 (x i )} = ˆP 1 + ˆQ 2 i A 1 i A 2 where m 0 (x i ) = m(x i, β 0 ) and β 0 = p lim ˆβ with respect to survey 2. Thus, and E(Ŷ p ) = N N m 0 (x i ) + {y i m 0 (x i )} = i=1 i=1 V (Ŷ p ) = V (ˆP 1 ) + V ( ˆQ 2 ). N y i. i=1

12 12 2. Projection estimation Model-assisted approach: Asymptotic unbiasedness of Ŷ p does not depend on the validity of the working model but efficiency is affected. Note: In the variance decomposition V (Ŷp) = V (ˆP 1 ) + V ( ˆQ 2 ) = V 1 + V 2. V 1 is based on n 1 sample elements and V 2 is based on n 2 sample elements. If n 2 << n 1, then V 1 << V 2. If the working model is good, then the squared error terms ei 2 = {y i m 0 (x i )} 2 are small and V 2 will also be small.

13 2. Projection Estimation When is condition (*) satisfied? If 1 is an element of x i, this condition is satisfied for linear regression m(x i, β) = x iβ and logistic regression logit{m(x i, β)} = x i β when ˆβ is obtained from the estimating equation w i2 x i (y i m i ) = 0 i A 2 for linear and logistic regression working models. For the ratio model, ˆβ is the solution of i A 2 w i2 (y i m i ) = 0.

14 14 2. Projection Estimation Linearization variance estimation Let e i = y i ỹ i, then the variance estimator of Ŷ p is v L (Ŷ p ) = v 1 (ỹ i ) + v 2 (ê i ) v 1 ( z i ) = v(ẑ 1 ) = variance estimator for survey 1 v 2 ( z i ) = v(ẑ 2 ) = variance estimator for survey 2 Ẑ 1 = i A 1 w i1 z i, Ẑ 2 = i A 2 w i2 z i. Note v L (Ŷ p ) requires access to data from both surveys.

15 15 2. Projection Estimation Estimation of domain total Y d Projection domain estimator Ŷ d,p = i A 1 w i1 δ i (d)ỹ i Ŷ d,p is asymptotically unbiased if Case (i) : w i2 δ i (d)(y i ỹ i ) = 0 i A 2 OR Case (ii) : Cov {δ i (d), y i m(x i, β 0 )} = 0.

16 16 2. Projection Estimation Estimation of domain total Y d Case (i): For linear or logistic regression models (i) is satisfied if δ i (d) is an elements of x i. For planned domains specified in advance, augmented working models can be used. Survey 1 data file should provide planned domain indicators. Case (ii): If working model is good, then the relative bias of Ŷ d,p would be small. Ŷ d,p is asymptotically model unbiased if model is correct. Ŷ d,p can be significantly design biased for some populations.

17 3. Replication variance estimation Replication variance estimation for Ŷ p Replication variance estimator for survey 1: L 1 ) (k) 2 v 1,rep (Ẑ) = c k (Ẑ 1 Ẑ 1 k=1 Ẑ (k) 1 = i A 1 w (k) i1 z i and {w (k) i1, i A 1}, k = 1,, L 1 : replication weights for survey 1 Replication variance estimator for Ŷ p : where Ŷ p (k) = i A 1 w (k) i1 values for replicate k. L 1 (k) v 1,rep (Ŷp) = c k (Ŷ p k=1 ỹ (k) i Ŷp ) 2 and {ỹ (k) i, i A 1 } are synthetic

18 18 3. Replication variance estimation Replication variance estimation for Ŷ p How to create replicated synthetic data {ỹ (k) i, i A 1 }? 1 Create {w (k) i2, k = 1,, L 1; i A 2 } such that L 1 k=1 c k (Ŷ (k) 2 Ŷ 2 ) 2 = v2 (Ŷ 2 ) 2 Compute ˆβ (k) and ỹ (k) i = m(x i, ˆβ (k) ) by solving w (k) i2 {y i m(x i, β)}x i = 0 i A 2 for ˆβ (k) (linear or logistic linear regression) v 1,rep (Ŷ p ) is asymptotically unbiased. Data file for sample A 1 should contain additional columns of {ỹ (k) i, i A 1 } and associated {w (k) i1, i A 1}, k = 1, 2,, L 1.

19 19 3. Replication variance estimation Replication variance estimation for Ŷ d,p Let Ŷ (k) d,p = i A 1 w (k) i1 δ i(d)ỹ (k) i, then L 1 ) (k) 2 v 1,rep (Ŷd,p) = c k (Ŷ d,p Ŷd,p k=1 Asymptotically unbiased under either case (i) or case (ii).

20 4. Optimal estimator: Full information Estimation of total Y Three estimators for two parameters Survey 1: ˆX1 for X Survey 2: ( ˆX 2, Ŷ2) for (X, Y ) Combine information using generalized least squares minimize Q(X, Y ) = ˆX 1 X ˆX 2 X Ŷ 2 Y V 1 ˆX 1 X ˆX 2 X Ŷ 2 Y with respect to (X, Y ) where V is the variance-covariance matrix of ( ˆX 1, ˆX 2, Ŷ 2 ).

21 21 4. Optimal estimator: Full information Estimation of total Y Best linear unbiased estimator based on ˆX 2, Ŷ2 and ˆX 1 : Ỹ opt = Ŷ2 + B y x2 ( Xopt ˆX 2 ) X opt = V xx2 ˆX 1 + V xx1 ˆX 2 V xx1 + V xx2 where B y x2 = V yx2 /V xx2, V xx1 = V ( ˆX 1 ), V xx2 = V ( ˆX 2 ), V yx2 = Cov(Ŷ2, ˆX 2 ). Replace variances in Ỹopt by estimated variances to get Ŷopt and ˆX opt.

22 22 4. Optimal estimator: Full information Estimation of total Y Ŷ opt can be expressed as Ŷ opt = i A 2 w i2y i {wi2, i A 2} are calibration weights: i A 2 wi2 x i = ˆX opt. Ŷ opt can be computed from data file for A 2 providing weights {wi2, i A 2} Example: Simple random samples A 1 and A 2 w i2 = N n 2 + x 2 : mean of x for A 2 ( ) x i x 2 ˆX opt ˆX 2 i A 2 (x i x 2 ) 2

23 4. Optimal estimator: Full information Domain estimation Calibration estimator: Ŷ d = i A 2 w i2δ i (d)y i computed from data file for A 2 only. Projection estimator: Ŷ p,d = i A 1 w i1 δ i (d)ỹ i computed from data file for A 1. Both Ŷd and Ŷ d,p satisfy internal consistency property: Ŷd = Ŷ opt, Ŷ d,p = Ŷ p d d

24 4. Optimal estimator: Full information Domain estimation Ŷd is asymptotically design unbiased but can lead to a large variance if domain contains few sample A 2 units. Optimal estimator Ŷ opt,d based on domain specific variances does not satisfy internal consistency, may not be stable for small domain sample size and it cannot be implemented from A 2 data file.

25 5. Simulation Study Simulation Setup Two artificial populations A and B of size N = 10, 000: {(y i, x i, z i ); i = 1,, N} Population A: Regression model Population B: Ratio model x i χ 2 (2), y i = x i + e i e i N(0, 2), z i Unif(0, 1) z i independent of (x i, y i ) same (x i, z i ) but y i = 0.7x i + u i u i N(0, x i ) cov(y, x) = 0.71 for both populations Domain d: δ i (d) = 1 if z i < 0.3; δ i (d) = 0 otherwise.

26 26 5. Simulation Study Simulation Setup Two independent simple random samples: n 1 = 500, n 2 = 100 Working models: linear regression, ratio, augmented linear regression, augmented ratio Relative bias: RB(Ŷ ) = {E(Ŷ ) Y }/Y Relative efficiency: RE(Ŷ ) = mse(ŷopt)/mse(ŷ )

27 27 5. Simulation Study Simulation Results Table 1: Simulation Results (Point estimation) Parameter Estimator Population A Population B RB RE RB RE Total Regression projection Ratio projection Aug. Reg. projection Aug. Rat. projection Optimal Domain Regression projection Ratio projection Aug. Reg. projection Aug. Rat. projection Optimal Calibration

28 28 5. Simulation Study Conclusions from Table 1 Estimation of total Y 1 RB of all estimator negligible: less than 2% 2 Regression projection estimator almost as efficient as Ŷopt even when the true model is ratio model. Ratio projection estimator is considerably less efficient if the true model has substantial intercept term: model diagnostics to identify good working model 3 Augmented projection estimators similar to corresponding projection estimators in terms of RB and RE.

29 29 5. Simulation Study Conclusions from Table 1 Domain estimation 1 RB of all estimators less than 5%: simulation setup ensures δ i (d) unrelated to r i = y i m(x i ; β). 2 Regression projection estimator considerably more efficient than the calibration estimator or optimal estimator: projection estimator based on larger sample size 3 Ratio projection estimator considerably less efficient if the model has substantial intercept term.

30 30 5. Simulation Study Jackknife variance estimation L 1 = n 2 = 100 pseudo replicates by random group jackknife Table 2: Simulation Results (relative biases of var. est.) Point Estimator Parameter Pop. A Pop. B Regression Projection Total Domain Ratio Projection Total Domain Aug. Reg. Projection Total Domain Aug. Rat. Projection Total Domain RB of jackknife variance estimators small: less than 5%

31 6. Discussion Some alternative approaches The proposed method does not lead to the optimal estimator: Ŷ opt = Ŷ 2 + ˆB y x2 ( X opt ˆX 2 ) X opt = V xx2 ˆX 1 + V xx1 ˆX 2 V xx1 + V xx2 To implement the optimal estimator using synthetic data, we may express Ŷ opt = w i3 ỹ i + w i2 (y i ỹ i ) i A 2 i A 1 where ỹ i = x i ˆB y x2, A 1 = A 1 A 2 and w i3 is the sampling weight for A 1 satisfying i A 1 w i3 x i = X opt 31

32 6. Discussion Some alternative approaches If i A 2 w i2 = i A w i3, then we can further express 1 Ŷ opt = w i3 w ij (ŷ i + ê j ) j A 2 i A 1 where w ij = w j2 /( i A 2 w i2 ) and ê j = y j ŷ j. It now take the form of fractional imputation considered in Fuller & Kim (2005). To reduce the size of the data set, we may consider random selection of M residuals to get êj and Ŷ FI = M i A 1 j=1 where wij satisfies M j=1 w ij w i3 wij (ŷi + êj ), ( 1, ê j 32 ) = j A 2 w ij (1, ê j).

33 33 6. Discussion Some alternative approaches Nested two-phase sampling: A 2 A 1 Non-nested two-phase sampling : A 1, A 2 independent We can convert non-nested two-phase sampling into a nested two-phase sampling A 2 A 1 where A 1 = A 1 A 2 Synthetic data can be released for A 1

34 34 6. Discussion Parametric multiple imputation Assume that f (y i x i, θ) is known for fixed θ and that A 1 and A 2 are simple random samples Obtain the posterior distribution of θ: p(θ y 2, x 2 ) assuming a diffuse prior on θ, where (y 2, x 2 )= data from A 2 Draw M values θ (1),, θ (M) from the posterior distribution. Draw y (l) i from f (y i x i, θ (l) ) for i A 1 and l = 1,, M. Synthetic data sets: {y (l) i, i A 1 }, l = 1,, M. Standard multiple imputation variance estimators do not work here. Reiter (2008) proposed a two-stage imputation procedure requiring T synthetic data sets {y (l) it : i A 1, t = 1,, T } for each θ (l) to be generated. In all, TM synthetic data sets are generated.

35 35 6. Discussion Conclusion The proposed method is based on determination imputation to generate synthetic values. Synthetic data along with the replicates are created for survey 1 and only survey 1 data is released. Significant efficiency gain is achieved for domain estimation. Stochastic imputation approach is under study.

36 REFERENCES Binder, D.A. anad Babyak, C., Brodeur, M., Hidiroglou, M., & Jocelyn, W. (2000). Variance estimation for two-phase stratified sampling. Can. J. Statist. 28, Fuller, W. A. (2003). Estimation for multiple phase samples. In Analysis of Survey Data, R. L. Chambers & C. J. Skinner, eds. Wiley: Chichester, England. Fuller, W. A. & Kim, J.-K. (2005). Hot deck imputation for the response model. Survey Methodology 31, Hansen, M. & Hurwitz, W. (1946). The problem of non-response in sample surveys. J. Am. Statist. Assoc. 41, Hidiroglou, M. (2001). Double sampling. Survey Methodol. 27,

37 Hidiroglou, M. A., Rao, J. N. K. & Haziza, D. (2009). Variance estimation in two-phase sampling. Australian and New Zealand Journal of Statistics 51, Kim, J. K., Navarro, A. & Fuller, W. A. (2006). Replicate variance estimation after multi-phase stratified sampling. J. Am. Statist. Assoc. 101, Kott, P. & Stukel, D. (1997). Can the jackknife be used with a two-phase sample? Survey Methodology 23, Neyman, J. (1934). On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society 97, Rao, J. N. K. (1973). On double sampling for stratification and analytical surveys. Biometrika 60, Reiter, J. (2008). Multiple imputation when records used for imputation are not used or disseminated for analysis. Biometrika 95,

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

Imputation for Missing Data under PPSWR Sampling

Imputation for Missing Data under PPSWR Sampling July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR

More information

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING Statistica Sinica 13(2003), 641-653 EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING J. K. Kim and R. R. Sitter Hankuk University of Foreign Studies and Simon Fraser University Abstract:

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

Introduction to Survey Data Integration

Introduction to Survey Data Integration Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5

More information

The Use of Survey Weights in Regression Modelling

The Use of Survey Weights in Regression Modelling The Use of Survey Weights in Regression Modelling Chris Skinner London School of Economics and Political Science (with Jae-Kwang Kim, Iowa State University) Colorado State University, June 2013 1 Weighting

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

Miscellanea A note on multiple imputation under complex sampling

Miscellanea A note on multiple imputation under complex sampling Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st )

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

in Survey Sampling Petr Novák, Václav Kosina Czech Statistical Office Using the Superpopulation Model for Imputations and Variance

in Survey Sampling Petr Novák, Václav Kosina Czech Statistical Office Using the Superpopulation Model for Imputations and Variance Using the Superpopulation Model for Imputations and Variance Computation in Survey Sampling Czech Statistical Office Introduction Situation Let us have a population of N units: n sampled (sam) and N-n

More information

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Chapter 4 Replication Variance Estimation J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Jackknife Variance Estimation Create a new sample by deleting one observation n 1 n n ( x (k) x) 2 = x (k) = n

More information

Accounting for Complex Sample Designs via Mixture Models

Accounting for Complex Sample Designs via Mixture Models Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

Calibration estimation in survey sampling

Calibration estimation in survey sampling Calibration estimation in survey sampling Jae Kwang Kim Mingue Park September 8, 2009 Abstract Calibration estimation, where the sampling weights are adjusted to make certain estimators match known population

More information

Graybill Conference Poster Session Introductions

Graybill Conference Poster Session Introductions Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary

More information

arxiv:math/ v1 [math.st] 23 Jun 2004

arxiv:math/ v1 [math.st] 23 Jun 2004 The Annals of Statistics 2004, Vol. 32, No. 2, 766 783 DOI: 10.1214/009053604000000175 c Institute of Mathematical Statistics, 2004 arxiv:math/0406453v1 [math.st] 23 Jun 2004 FINITE SAMPLE PROPERTIES OF

More information

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Melike Oguz-Alper Yves G. Berger Abstract The data used in social, behavioural, health or biological

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

Sampling techniques for big data analysis in finite population inference

Sampling techniques for big data analysis in finite population inference Statistics Preprints Statistics 1-29-2018 Sampling techniques for big data analysis in finite population inference Jae Kwang Kim Iowa State University, jkim@iastate.edu Zhonglei Wang Iowa State University,

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

VARIANCE ESTIMATION FOR TWO-PHASE STRATIFIED SAMPLING

VARIANCE ESTIMATION FOR TWO-PHASE STRATIFIED SAMPLING VARIACE ESTIMATIO FOR TWO-PHASE STRATIFIED SAMPLIG David A. Binder, Colin Babyak, Marie Brodeur, Michel Hidiroglou Wisner Jocelyn (Statistics Canada) Business Survey Methods Division, Statistics Canada,

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University

More information

Fractional hot deck imputation

Fractional hot deck imputation Biometrika (2004), 91, 3, pp. 559 578 2004 Biometrika Trust Printed in Great Britain Fractional hot deck imputation BY JAE KWANG KM Department of Applied Statistics, Yonsei University, Seoul, 120-749,

More information

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators Stefano Marchetti 1 Nikos Tzavidis 2 Monica Pratesi 3 1,3 Department

More information

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Statistica Sinica 22 (2012), 777-794 doi:http://dx.doi.org/10.5705/ss.2010.238 BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Desislava Nedyalova and Yves Tillé University of

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Statistica Sinica 8(1998), 1165-1173 A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Phillip S. Kott National Agricultural Statistics Service Abstract:

More information

A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions

A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions Y.G. Berger O. De La Riva Torres Abstract We propose a new

More information

Combining Non-probability and. Probability Survey Samples Through Mass Imputation

Combining Non-probability and. Probability Survey Samples Through Mass Imputation Combining Non-probability and arxiv:1812.10694v2 [stat.me] 31 Dec 2018 Probability Survey Samples Through Mass Imputation Jae Kwang Kim Seho Park Yilin Chen Changbao Wu January 1, 2019 Abstract. This paper

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Markus Haas LMU München Summer term 2011 15. Mai 2011 The Simple Linear Regression Model Considering variables x and y in a specific population (e.g., years of education and wage

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Asymptotic Normality under Two-Phase Sampling Designs

Asymptotic Normality under Two-Phase Sampling Designs Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context

More information

RESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy.

RESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy. CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2014 www.csgb.dk RESEARCH REPORT Ina Trolle Andersen, Ute Hahn and Eva B. Vedel Jensen Vanishing auxiliary variables in PPS sampling with applications

More information

What is Survey Weighting? Chris Skinner University of Southampton

What is Survey Weighting? Chris Skinner University of Southampton What is Survey Weighting? Chris Skinner University of Southampton 1 Outline 1. Introduction 2. (Unresolved) Issues 3. Further reading etc. 2 Sampling 3 Representation 4 out of 8 1 out of 10 4 Weights 8/4

More information

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Statistica Sinica 17(2007), 1047-1064 ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Jiahua Chen and J. N. K. Rao University of British Columbia and Carleton University Abstract: Large sample properties

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Improvement in Estimating the Finite Population Mean Under Maximum and Minimum Values in Double Sampling Scheme

Improvement in Estimating the Finite Population Mean Under Maximum and Minimum Values in Double Sampling Scheme J. Stat. Appl. Pro. Lett. 2, No. 2, 115-121 (2015) 115 Journal of Statistics Applications & Probability Letters An International Journal http://dx.doi.org/10.12785/jsapl/020203 Improvement in Estimating

More information

Econ 2120: Section 2

Econ 2120: Section 2 Econ 2120: Section 2 Part I - Linear Predictor Loose Ends Ashesh Rambachan Fall 2018 Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted

More information

Simple Linear Regression Analysis

Simple Linear Regression Analysis LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study

More information

New estimation methodology for the Norwegian Labour Force Survey

New estimation methodology for the Norwegian Labour Force Survey Notater Documents 2018/16 Melike Oguz-Alper New estimation methodology for the Norwegian Labour Force Survey Documents 2018/16 Melike Oguz Alper New estimation methodology for the Norwegian Labour Force

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Admissible Estimation of a Finite Population Total under PPS Sampling

Admissible Estimation of a Finite Population Total under PPS Sampling Research Journal of Mathematical and Statistical Sciences E-ISSN 2320-6047 Admissible Estimation of a Finite Population Total under PPS Sampling Abstract P.A. Patel 1* and Shradha Bhatt 2 1 Department

More information

Nonstationary Panels

Nonstationary Panels Nonstationary Panels Based on chapters 12.4, 12.5, and 12.6 of Baltagi, B. (2005): Econometric Analysis of Panel Data, 3rd edition. Chichester, John Wiley & Sons. June 3, 2009 Agenda 1 Spurious Regressions

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Propensity score adjusted method for missing data

Propensity score adjusted method for missing data Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting)

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) Phillip S. Kott RTI International NISS Worshop on the Analysis of Complex Survey Data With Missing Item Values October 17, 2014 1 RTI

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy. 5 Sampling with Unequal Probabilities Simple random sampling and systematic sampling are schemes where every unit in the population has the same chance of being selected We will now consider unequal probability

More information

Small Domain Estimation for a Brazilian Service Sector Survey

Small Domain Estimation for a Brazilian Service Sector Survey Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS003) p.334 Small Domain Estimation for a Brazilian Service Sector Survey André Neves 1, Denise Silva and Solange Correa

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson Department of Statistics Stockholm University Introduction

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Keywords: Survey sampling, finite populations, simple random sampling, systematic

More information

Slides 12: Output Analysis for a Single Model

Slides 12: Output Analysis for a Single Model Slides 12: Output Analysis for a Single Model Objective: Estimate system performance via simulation. If θ is the system performance, the precision of the estimator ˆθ can be measured by: The standard error

More information

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Statistica Sinica 13(2003), 613-623 TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Hansheng Wang and Jun Shao Peking University and University of Wisconsin Abstract: We consider the estimation

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS David A Binder and Georgia R Roberts Methodology Branch, Statistics Canada, Ottawa, ON, Canada K1A 0T6 KEY WORDS: Design-based properties, Informative sampling,

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Christopher Ting Christopher Ting : christophert@smu.edu.sg : 688 0364 : LKCSB 5036 January 7, 017 Web Site: http://www.mysmu.edu/faculty/christophert/ Christopher Ting QF 30 Week

More information

arxiv: v1 [stat.me] 3 Nov 2015

arxiv: v1 [stat.me] 3 Nov 2015 The Unbiasedness Approach to Linear Regression Models arxiv:5.0096v [stat.me] 3 Nov 205 P. Vellaisamy Department of Mathematics, Indian Institute of Technology Bombay, Powai, Mumbai-400076, India. Abstract

More information

Successive Difference Replication Variance Estimation in Two-Phase Sampling

Successive Difference Replication Variance Estimation in Two-Phase Sampling Successive Difference Replication Variance Estimation in Two-Phase Sampling Jean D. Opsomer Colorado State University Michael White US Census Bureau F. Jay Breidt Colorado State University Yao Li Colorado

More information

A Short Course in Basic Statistics

A Short Course in Basic Statistics A Short Course in Basic Statistics Ian Schindler November 5, 2017 Creative commons license share and share alike BY: C 1 Descriptive Statistics 1.1 Presenting statistical data Definition 1 A statistical

More information

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Simple design-efficient calibration estimators for rejective and high-entropy sampling

Simple design-efficient calibration estimators for rejective and high-entropy sampling Biometrika (202), 99,, pp. 6 C 202 Biometrika Trust Printed in Great Britain Advance Access publication on 3 July 202 Simple design-efficient calibration estimators for rejective and high-entropy sampling

More information

Linear regression with nested errors using probability-linked data

Linear regression with nested errors using probability-linked data University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2014 Linear regression with nested errors using

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines Nonparametric Small Estimation via M-quantile Regression using Penalized Splines Monica Pratesi 10 August 2008 Abstract The demand of reliable statistics for small areas, when only reduced sizes of the

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 2: Simple Regression Egypt Scholars Economic Society Happy Eid Eid present! enter classroom at http://b.socrative.com/login/student/ room name c28efb78 Outline

More information