Combining data from two independent surveys: model-assisted approach

Similar documents
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

Chapter 8: Estimation 1

Fractional Imputation in Survey Sampling: A Comparative Review

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Combining Non-probability and Probability Survey Samples Through Mass Imputation

A measurement error model approach to small area estimation

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

Two-phase sampling approach to fractional hot deck imputation

Parametric fractional imputation for missing data analysis

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

Weighting in survey analysis under informative sampling

Imputation for Missing Data under PPSWR Sampling

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

Data Integration for Big Data Analysis for finite population inference

On the bias of the multiple-imputation variance estimator in survey sampling

Introduction to Survey Data Integration

The Use of Survey Weights in Regression Modelling

Nonresponse weighting adjustment using estimated response probability

A note on multiple imputation for general purpose estimation

Calibration estimation using exponential tilting in sample surveys

6. Fractional Imputation in Survey Sampling

Bootstrap inference for the finite population total under complex sampling designs

Empirical Likelihood Methods for Sample Survey Data: An Overview

Miscellanea A note on multiple imputation under complex sampling

Applied Econometrics (QEM)

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

Recent Advances in the analysis of missing data with non-ignorable missingness

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

Chapter 4: Imputation

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

in Survey Sampling Petr Novák, Václav Kosina Czech Statistical Office Using the Superpopulation Model for Imputations and Variance

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

Accounting for Complex Sample Designs via Mixture Models

arxiv: v2 [math.st] 20 Jun 2014

Calibration estimation in survey sampling

Graybill Conference Poster Session Introductions

arxiv:math/ v1 [math.st] 23 Jun 2004

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data

Model Assisted Survey Sampling

Sampling techniques for big data analysis in finite population inference

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

VARIANCE ESTIMATION FOR TWO-PHASE STRATIFIED SAMPLING

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

Fractional hot deck imputation

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING

Finite Population Sampling and Inference

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

A Unified Theory of Empirical Likelihood Confidence Intervals for Survey Data with Unequal Probabilities and Non Negligible Sampling Fractions

Combining Non-probability and. Probability Survey Samples Through Mass Imputation

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Intermediate Econometrics

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Weighted Least Squares

Asymptotic Normality under Two-Phase Sampling Designs

RESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy.

What is Survey Weighting? Chris Skinner University of Southampton

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS

The regression model with one fixed regressor cont d

Lecture 14 Simple Linear Regression

Lecture 8: Information Theory and Statistics

Improvement in Estimating the Finite Population Mean Under Maximum and Minimum Values in Double Sampling Scheme

Econ 2120: Section 2

Simple Linear Regression Analysis

New estimation methodology for the Norwegian Labour Force Survey

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Admissible Estimation of a Finite Population Total under PPS Sampling

Nonstationary Panels

Plausible Values for Latent Variables Using Mplus

Propensity score adjusted method for missing data

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting)

Weighted Least Squares

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

Small Domain Estimation for a Brazilian Service Sector Survey

Homoskedasticity. Var (u X) = σ 2. (23)

Likelihood-based inference with missing data under missing-at-random

A comparison of stratified simple random sampling and sampling with probability proportional to size

Statistics 910, #5 1. Regression Methods

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.

Slides 12: Output Analysis for a Single Model

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION

Combining multiple observational data sources to estimate causal eects

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Bayesian Linear Regression

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

Simple Linear Regression

arxiv: v1 [stat.me] 3 Nov 2015

Successive Difference Replication Variance Estimation in Two-Phase Sampling

A Short Course in Basic Statistics

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

ECON The Simple Regression Model

Simple design-efficient calibration estimators for rejective and high-entropy sampling

Linear regression with nested errors using probability-linked data

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines

Making sense of Econometrics: Basics

Transcription:

Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University

Reference Kim, J.K. and Rao, J.N.K. (2012). Combining data from two independent surveys: a model-assisted approach, Biometrika, In Press. (Available online via Advance Access 10.1093/biomet/asr063.)

3 Outline 1 Introduction 2 Projection estimation 3 Replication variance estimation 4 Efficient estimation: Full information 5 Simulation study 6 Concluding remarks & Discussion

1. Introduction Two-phase sampling (Classical) Two-phase sampling A 1 : first-phase sample of size n 1 A 2 : second-phase sample of size n 2 (A 2 A 1 ) x observed in phase 1 and both y and x observed in phase 2. Assume that 1 is an element of x i. Neyman (1934), Hansen & Hurwitz (1946), Rao (1973), Kott & Stukel (1997), Binder et al. (2000), Kim et al. (2006), Hidiroglou et al. (2009).

5 1. Introduction Two-phase sampling GREG estimator of Y = N i=1 y i: Ŷ G = ˆX ˆβ 1 2 ˆX 1 = w 1i x i, ˆβ 2 = w 2i x i x i i A1 i A2 1 Two ways of implementing the GREG estimator Calibration: create data file for A 2 ( Ŷ G = i A 2 w 2G,i y i, w 2G,i = ˆX 1 i A 2 w 2i x i x i Projection estimation: create data file for A 1. Ŷ G = i A 1 w 1i ỹ i, ỹ i = x i ˆβ 2 i A 2 w 2i x i y i ) 1 w 2ix i

6 1. Introduction Domain projection estimators Calibration estimator of domain total Y d = N i=1 δ i(d)y i : Ŷ Cal,d = i A 2 w 2G,i δ i (d)y i δ i (d) = 1 if i belongs to domain d, δ i (d) = 0 otherwise. Note: Ŷ Cal,d is based only on the domain sample belonging to A 2 and it could lead to large variance if domain A 2 sample is very small.

1. Introduction Domain projection estimators Domain projection estimator (Fuller, 2003) Ŷ p,d = i A 1 w i1 δ i (d)ỹ i Note: Ŷ p,d is based on much larger domain sample belonging to A 1 than Ŷ Cal,d based on domain sample belonging to A 2. Hence, Ŷp,d could be significantly more efficient if its relative bias is small. Under the model y i = x i β + e i with E(e i ) = 0, Ŷ p,d is model unbiased for Y d. But, it is possible to construct populations for which Ŷ p,d is very design biased (Fuller, 2003).

8 1. Introduction Combining two independent surveys Large sample A 1 collecting only x, and weights {w i1, i A 1 }. Much smaller sample A 2 collecting x and y drawn independently and weights {w i2, i A 2 }. Example 1 (Hidiroglou, 2001): Canadian Survey of Employment, Payrolls and Hours A 1 : Large sample drawn from a Canadian Customs and Revenue Agency administrative data file and auxiliary variables x observed. A 2 : Small sample from Statistics Canada Business Register and study variables y, number of hours worked by employees and summarized earnings, observed.

9 1. Introduction Combining two independent surveys Example 2 (Reiter, 2008) A 2 : Both self-reported health measurements, x, and clinical measurements from physical examinations, y, observed A 1 : Only x reported Synthetic values ỹ i, i A 1 are created by first fitting a working model E(y) = m(x, β) relating y to x to data {(y i, x i ), i A 2 } and then predicting y i associated with x i, i A 1. Only synthetic values ỹ i = m(x i, ˆβ), i A 1 and associated weights w i1, i A 1 are released to the public. Our focus is on producing estimators of totals and domain totals from the synthetic data file {(ỹ i, w i1 ), i A 1 }.

10 2. Projection estimation Estimation of Y Projection estimator of Y : Ŷ p = i A 1 w i1 ỹ i Ŷ p is asymptotically design-unbiased if ˆβ satisfies { ( )} y i m x i, ˆβ = 0 ( ) i A 2 w i2 Note: Under condition (*), Ŷ p = i A 1 w i1 ỹ i + i A 2 w i2 {y i ỹ i } = prediction + bias correction

2. Projection estimation Estimation of Y Theorem 1: Under some regularity conditions, if ˆβ satisfies condition (*), we can write Ŷ p = w i1 m 0 (x i ) + w i2 {y i m 0 (x i )} = ˆP 1 + ˆQ 2 i A 1 i A 2 where m 0 (x i ) = m(x i, β 0 ) and β 0 = p lim ˆβ with respect to survey 2. Thus, and E(Ŷ p ) = N N m 0 (x i ) + {y i m 0 (x i )} = i=1 i=1 V (Ŷ p ) = V (ˆP 1 ) + V ( ˆQ 2 ). N y i. i=1

12 2. Projection estimation Model-assisted approach: Asymptotic unbiasedness of Ŷ p does not depend on the validity of the working model but efficiency is affected. Note: In the variance decomposition V (Ŷp) = V (ˆP 1 ) + V ( ˆQ 2 ) = V 1 + V 2. V 1 is based on n 1 sample elements and V 2 is based on n 2 sample elements. If n 2 << n 1, then V 1 << V 2. If the working model is good, then the squared error terms ei 2 = {y i m 0 (x i )} 2 are small and V 2 will also be small.

2. Projection Estimation When is condition (*) satisfied? If 1 is an element of x i, this condition is satisfied for linear regression m(x i, β) = x iβ and logistic regression logit{m(x i, β)} = x i β when ˆβ is obtained from the estimating equation w i2 x i (y i m i ) = 0 i A 2 for linear and logistic regression working models. For the ratio model, ˆβ is the solution of i A 2 w i2 (y i m i ) = 0.

14 2. Projection Estimation Linearization variance estimation Let e i = y i ỹ i, then the variance estimator of Ŷ p is v L (Ŷ p ) = v 1 (ỹ i ) + v 2 (ê i ) v 1 ( z i ) = v(ẑ 1 ) = variance estimator for survey 1 v 2 ( z i ) = v(ẑ 2 ) = variance estimator for survey 2 Ẑ 1 = i A 1 w i1 z i, Ẑ 2 = i A 2 w i2 z i. Note v L (Ŷ p ) requires access to data from both surveys.

15 2. Projection Estimation Estimation of domain total Y d Projection domain estimator Ŷ d,p = i A 1 w i1 δ i (d)ỹ i Ŷ d,p is asymptotically unbiased if Case (i) : w i2 δ i (d)(y i ỹ i ) = 0 i A 2 OR Case (ii) : Cov {δ i (d), y i m(x i, β 0 )} = 0.

16 2. Projection Estimation Estimation of domain total Y d Case (i): For linear or logistic regression models (i) is satisfied if δ i (d) is an elements of x i. For planned domains specified in advance, augmented working models can be used. Survey 1 data file should provide planned domain indicators. Case (ii): If working model is good, then the relative bias of Ŷ d,p would be small. Ŷ d,p is asymptotically model unbiased if model is correct. Ŷ d,p can be significantly design biased for some populations.

3. Replication variance estimation Replication variance estimation for Ŷ p Replication variance estimator for survey 1: L 1 ) (k) 2 v 1,rep (Ẑ) = c k (Ẑ 1 Ẑ 1 k=1 Ẑ (k) 1 = i A 1 w (k) i1 z i and {w (k) i1, i A 1}, k = 1,, L 1 : replication weights for survey 1 Replication variance estimator for Ŷ p : where Ŷ p (k) = i A 1 w (k) i1 values for replicate k. L 1 (k) v 1,rep (Ŷp) = c k (Ŷ p k=1 ỹ (k) i Ŷp ) 2 and {ỹ (k) i, i A 1 } are synthetic

18 3. Replication variance estimation Replication variance estimation for Ŷ p How to create replicated synthetic data {ỹ (k) i, i A 1 }? 1 Create {w (k) i2, k = 1,, L 1; i A 2 } such that L 1 k=1 c k (Ŷ (k) 2 Ŷ 2 ) 2 = v2 (Ŷ 2 ) 2 Compute ˆβ (k) and ỹ (k) i = m(x i, ˆβ (k) ) by solving w (k) i2 {y i m(x i, β)}x i = 0 i A 2 for ˆβ (k) (linear or logistic linear regression) v 1,rep (Ŷ p ) is asymptotically unbiased. Data file for sample A 1 should contain additional columns of {ỹ (k) i, i A 1 } and associated {w (k) i1, i A 1}, k = 1, 2,, L 1.

19 3. Replication variance estimation Replication variance estimation for Ŷ d,p Let Ŷ (k) d,p = i A 1 w (k) i1 δ i(d)ỹ (k) i, then L 1 ) (k) 2 v 1,rep (Ŷd,p) = c k (Ŷ d,p Ŷd,p k=1 Asymptotically unbiased under either case (i) or case (ii).

4. Optimal estimator: Full information Estimation of total Y Three estimators for two parameters Survey 1: ˆX1 for X Survey 2: ( ˆX 2, Ŷ2) for (X, Y ) Combine information using generalized least squares minimize Q(X, Y ) = ˆX 1 X ˆX 2 X Ŷ 2 Y V 1 ˆX 1 X ˆX 2 X Ŷ 2 Y with respect to (X, Y ) where V is the variance-covariance matrix of ( ˆX 1, ˆX 2, Ŷ 2 ).

21 4. Optimal estimator: Full information Estimation of total Y Best linear unbiased estimator based on ˆX 2, Ŷ2 and ˆX 1 : Ỹ opt = Ŷ2 + B y x2 ( Xopt ˆX 2 ) X opt = V xx2 ˆX 1 + V xx1 ˆX 2 V xx1 + V xx2 where B y x2 = V yx2 /V xx2, V xx1 = V ( ˆX 1 ), V xx2 = V ( ˆX 2 ), V yx2 = Cov(Ŷ2, ˆX 2 ). Replace variances in Ỹopt by estimated variances to get Ŷopt and ˆX opt.

22 4. Optimal estimator: Full information Estimation of total Y Ŷ opt can be expressed as Ŷ opt = i A 2 w i2y i {wi2, i A 2} are calibration weights: i A 2 wi2 x i = ˆX opt. Ŷ opt can be computed from data file for A 2 providing weights {wi2, i A 2} Example: Simple random samples A 1 and A 2 w i2 = N n 2 + x 2 : mean of x for A 2 ( ) x i x 2 ˆX opt ˆX 2 i A 2 (x i x 2 ) 2

4. Optimal estimator: Full information Domain estimation Calibration estimator: Ŷ d = i A 2 w i2δ i (d)y i computed from data file for A 2 only. Projection estimator: Ŷ p,d = i A 1 w i1 δ i (d)ỹ i computed from data file for A 1. Both Ŷd and Ŷ d,p satisfy internal consistency property: Ŷd = Ŷ opt, Ŷ d,p = Ŷ p d d

4. Optimal estimator: Full information Domain estimation Ŷd is asymptotically design unbiased but can lead to a large variance if domain contains few sample A 2 units. Optimal estimator Ŷ opt,d based on domain specific variances does not satisfy internal consistency, may not be stable for small domain sample size and it cannot be implemented from A 2 data file.

5. Simulation Study Simulation Setup Two artificial populations A and B of size N = 10, 000: {(y i, x i, z i ); i = 1,, N} Population A: Regression model Population B: Ratio model x i χ 2 (2), y i = 1 + 0.7x i + e i e i N(0, 2), z i Unif(0, 1) z i independent of (x i, y i ) same (x i, z i ) but y i = 0.7x i + u i u i N(0, x i ) cov(y, x) = 0.71 for both populations Domain d: δ i (d) = 1 if z i < 0.3; δ i (d) = 0 otherwise.

26 5. Simulation Study Simulation Setup Two independent simple random samples: n 1 = 500, n 2 = 100 Working models: linear regression, ratio, augmented linear regression, augmented ratio Relative bias: RB(Ŷ ) = {E(Ŷ ) Y }/Y Relative efficiency: RE(Ŷ ) = mse(ŷopt)/mse(ŷ )

27 5. Simulation Study Simulation Results Table 1: Simulation Results (Point estimation) Parameter Estimator Population A Population B RB RE RB RE Total Regression projection 0.00 0.98 0.00 0.97 Ratio projection 0.00 0.58 0.00 0.99 Aug. Reg. projection 0.00 0.97 0.00 0.97 Aug. Rat. projection 0.01 0.55 0.00 0.98 Optimal 0.00 1.00 0.00 1.00 Domain Regression projection 0.00 1.96 0.01 2.01 Ratio projection 0.01 1.22 0.01 2.05 Aug. Reg. projection 0.00 1.05 0.00 0.98 Aug. Rat. projection 0.00 0.64 0.00 0.96 Optimal -0.01 1.00-0.02 1.00 Calibration 0.00 0.45 0.00 0.53

28 5. Simulation Study Conclusions from Table 1 Estimation of total Y 1 RB of all estimator negligible: less than 2% 2 Regression projection estimator almost as efficient as Ŷopt even when the true model is ratio model. Ratio projection estimator is considerably less efficient if the true model has substantial intercept term: model diagnostics to identify good working model 3 Augmented projection estimators similar to corresponding projection estimators in terms of RB and RE.

29 5. Simulation Study Conclusions from Table 1 Domain estimation 1 RB of all estimators less than 5%: simulation setup ensures δ i (d) unrelated to r i = y i m(x i ; β). 2 Regression projection estimator considerably more efficient than the calibration estimator or optimal estimator: projection estimator based on larger sample size 3 Ratio projection estimator considerably less efficient if the model has substantial intercept term.

30 5. Simulation Study Jackknife variance estimation L 1 = n 2 = 100 pseudo replicates by random group jackknife Table 2: Simulation Results (relative biases of var. est.) Point Estimator Parameter Pop. A Pop. B Regression Projection Total -0.013 0.024 Domain -0.030 0.006 Ratio Projection Total 0.032 0.000 Domain -0.001-0.017 Aug. Reg. Projection Total 0.033 0.040 Domain 0.022 0.050 Aug. Rat. Projection Total 0.059 0.030 Domain 0.064 0.061 RB of jackknife variance estimators small: less than 5%

6. Discussion Some alternative approaches The proposed method does not lead to the optimal estimator: Ŷ opt = Ŷ 2 + ˆB y x2 ( X opt ˆX 2 ) X opt = V xx2 ˆX 1 + V xx1 ˆX 2 V xx1 + V xx2 To implement the optimal estimator using synthetic data, we may express Ŷ opt = w i3 ỹ i + w i2 (y i ỹ i ) i A 2 i A 1 where ỹ i = x i ˆB y x2, A 1 = A 1 A 2 and w i3 is the sampling weight for A 1 satisfying i A 1 w i3 x i = X opt 31

6. Discussion Some alternative approaches If i A 2 w i2 = i A w i3, then we can further express 1 Ŷ opt = w i3 w ij (ŷ i + ê j ) j A 2 i A 1 where w ij = w j2 /( i A 2 w i2 ) and ê j = y j ŷ j. It now take the form of fractional imputation considered in Fuller & Kim (2005). To reduce the size of the data set, we may consider random selection of M residuals to get êj and Ŷ FI = M i A 1 j=1 where wij satisfies M j=1 w ij w i3 wij (ŷi + êj ), ( 1, ê j 32 ) = j A 2 w ij (1, ê j).

33 6. Discussion Some alternative approaches Nested two-phase sampling: A 2 A 1 Non-nested two-phase sampling : A 1, A 2 independent We can convert non-nested two-phase sampling into a nested two-phase sampling A 2 A 1 where A 1 = A 1 A 2 Synthetic data can be released for A 1

34 6. Discussion Parametric multiple imputation Assume that f (y i x i, θ) is known for fixed θ and that A 1 and A 2 are simple random samples Obtain the posterior distribution of θ: p(θ y 2, x 2 ) assuming a diffuse prior on θ, where (y 2, x 2 )= data from A 2 Draw M values θ (1),, θ (M) from the posterior distribution. Draw y (l) i from f (y i x i, θ (l) ) for i A 1 and l = 1,, M. Synthetic data sets: {y (l) i, i A 1 }, l = 1,, M. Standard multiple imputation variance estimators do not work here. Reiter (2008) proposed a two-stage imputation procedure requiring T synthetic data sets {y (l) it : i A 1, t = 1,, T } for each θ (l) to be generated. In all, TM synthetic data sets are generated.

35 6. Discussion Conclusion The proposed method is based on determination imputation to generate synthetic values. Synthetic data along with the replicates are created for survey 1 and only survey 1 data is released. Significant efficiency gain is achieved for domain estimation. Stochastic imputation approach is under study.

REFERENCES Binder, D.A. anad Babyak, C., Brodeur, M., Hidiroglou, M., & Jocelyn, W. (2000). Variance estimation for two-phase stratified sampling. Can. J. Statist. 28, 751 764. Fuller, W. A. (2003). Estimation for multiple phase samples. In Analysis of Survey Data, R. L. Chambers & C. J. Skinner, eds. Wiley: Chichester, England. Fuller, W. A. & Kim, J.-K. (2005). Hot deck imputation for the response model. Survey Methodology 31, 139 149. Hansen, M. & Hurwitz, W. (1946). The problem of non-response in sample surveys. J. Am. Statist. Assoc. 41, 517 529. Hidiroglou, M. (2001). Double sampling. Survey Methodol. 27, 143 54.

Hidiroglou, M. A., Rao, J. N. K. & Haziza, D. (2009). Variance estimation in two-phase sampling. Australian and New Zealand Journal of Statistics 51, 127 141. Kim, J. K., Navarro, A. & Fuller, W. A. (2006). Replicate variance estimation after multi-phase stratified sampling. J. Am. Statist. Assoc. 101, 312 320. Kott, P. & Stukel, D. (1997). Can the jackknife be used with a two-phase sample? Survey Methodology 23, 81 89. Neyman, J. (1934). On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society 97, 558 606. Rao, J. N. K. (1973). On double sampling for stratification and analytical surveys. Biometrika 60, 125 33. Reiter, J. (2008). Multiple imputation when records used for imputation are not used or disseminated for analysis. Biometrika 95, 933 46. 37