6. Fractional Imputation in Survey Sampling
|
|
- Brandon Lloyd
- 5 years ago
- Views:
Transcription
1 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population are study variables, z i = (x i, y i ) with y i = (y i1,, y ip ) being the vector of study variables that are subject to missingness and x i being the vector of auxiliary variables that are always observed. We are interested in estimating η, defined as a (unique) solution to the population estimating equation N i=1 U(η; z i) = 0. Examples of η includes 1. Population mean: U(η; x, y) = y η 2. Population proportion of Y less than q: U(η; x, y) = I(y < q) η 3. Population p-th quantitle : U(η; x, y) = I(y < η) p 4. Population regression coefficient: U(η; x, y) = (y xη)x 5. Domain mean: U(η; x, y) = (y η)d(x) Let A denote the set of indices for the units in a sample selected by a probability sampling mechanism with sample size n. Under complete response, a consistent estimator of η is obtained by solving w i U(η; z i ) = 0. (1) i A where w i = 1/π i is the inverse of the first-order inclusion probability. Under some regularity conditions, we can establish that the solution ˆη n to (1) converges in probability to η and is asymptotically normally distributed. 1
2 Things to consider Complex sampling: unequal probability of selection, multi-stage sampling General purpose estimation: We do not know which parameter η will be used by data analyst at the time of imputation. Multivariate missingness with arbitrary missing pattern. We cannot use large M. (We do not want to create a huge data file.) Two concepts of missing at random (MAR): (For simplicity, assume p = 1.) Population missing at random (PMAR): MAR holds in the population level f(y x) = f(y x, δ) That is, Y δ x. Sample missing at random (SMAR): MAR holds in the sample level f(y x, I = 1) = f(y x, I = 1, δ) That is, Y δ (x, I = 1), where I i = 1 if unit i A and I i = 0 otherwise. If the sampling design is such that P (I = 1 x, y) = P (I = 1 x), which is often called noninformative sampling design, then PMAR implies SMAR. For informative sampling design, PMAR does not necessarily imply SMAR. Under PMAR, an imputed value y i of missing y i satisfies while, under SMAR, it satisfies E{y i x i, δ i = 0} = E{y i x i, δ i = 0} (2) E{y i x i, I i = 1, δ i = 0} = E{y i x i, δ i = 0}. (3) Roughly speaking, fractional imputation is based on PMAR assumption, while multiple imputation (which will be covered next week) is based on SMAR assumption. 2
3 2 Parametric fractional imputation We assume that the finite population at hand is a realization from an infinite population, called superpopulation. In the superpopulation model, we often postulate a parametric distribution, f(y x; θ), which is known up to the parameter θ with parameter space Ω. The parametric model has a joint density f(y x; θ) = f 1 (y 1 x; θ 1 )f 2 (y 2 x, y 1 ; θ 2 ) f p (y p x, y 1,, y p 1 ; θ p ) (4) where θ k is the parameter in the conditional distribution of y k given x and y 1,, y k 1. For each y i = (y 1i,, y pi ), we have δ i = (δ 1i,, δ pi ) where δ ki = 1 if y ki is observed and δ ki = 0 otherwise. For example, p = 3, there are 8 = 2 3 possible missing patterns: A = A 111 A 110 A 000, where, for example, A 100 is the set of sample indices with δ 1i = 1, δ 2i = 0, and δ 3i = 0. For i A 100, we need to create imputed values for y 2i and y 3i from f(y 2i, y 3i x i, y 1i ). Without loss of generality, we can express y i = (y obs,i, y mis,i ), where y obs,i and y mis,i are the observed and the missing part of y i, respectively. Three steps for PFI under complex sampling 1. Compute the pseudo maximum likelihood estimator of θ using EM by PFI method with sufficiently large imputation size M (say, M = 1, 000). 2. Select m (say, m = 10) imputed values from the set of M imputed values. 3. Construct the final fractional weights for the m imputed data. The first step is called Fully Efficient Fractional Imputation (FEFI) step. The second step is Sampling Step, The third step is called Weighting Step. Step 1 (FEFI step): The pseudo maximum likelihood estimator of θ is computed by the following EM algorithm. 3
4 1. [I-step]: Set t = 0. Obtain M imputed values of y mis,i generated from a proposal distribution h(y mis,i x i, y obs,i ). One simple choice of h( ) is h(y mis,i x i, y obs,i ) = f(y mis,i x i, y obs,i ; ˆθ (0) ) (5) where ˆθ (0) is the initial estimator of θ obtained from the available respondents. Generating samples from (5) may require MCMC method or SIR (Sampling Importance Resampling) method. See Appendix B for an illustration of SIR method. Let w ij(0) for y (j) mis,i. = 1/M be the initial fractional weights 2. [M-step]: Update the parameter ˆθ (t+1) by solving the following imputed score equation, ˆθ (t+1) : solution to i A w i M j=1 w ij(t)s(θ; x i, y ij) = 0, where yij = (y obs,i, y (j) mis,i ) and S(θ; x, y) = log f(y x; θ)/ θ is the score function of θ. 3. [W-step]: Set t = t + 1. Using the current value of the parameter estimates ˆθ (t), compute the fractional weights w ij(t) f(y obs,i, y (j) mis,i x i; ˆθ (t) ) h(y (j) mis,i x i, y obs,i ) with M j=1 w ij(t) = 1. For t = 0, we set w ij(t) = 1/M. 4. Check if w ij(t) > 1/m for some j = 1,, M. If yes, update the proposal distribution with ˆθ 0 replaced by ˆθ (t) and goto [I-step]. If no, goto [M-step]. Stop if ˆθ (t) meets the convergence criterion. [I-step] is the imputation step, [W-step] is the weighting step, and [M-step] is the maximization step. Note that the imputed values are not changed in the EM iteration. Only the fractional weights are updated. Step 2 (Sampling Step): For each i, we have M possible imputed values z (j) i = (x i, y obs,i, y (j) mis,i ) with their fractional weights w ij, where w ij is computed from 4
5 the EM algorithm after convergence. For each i, we treat z i = {z (j) i ; j = 1, 2,, M} as as a weighted finite population (with weight w ij) and use an unequal probability sampling method to select a sample of size m from z i using w ij as the selection probability. (We can use a PPS sampling or systematic πps sampling to obtain an imputed data of size m.) Let z (1) i,, z (m) i elements sampled from z i by the PPS sampling. That is, P r( z (k) i = z (j) i ) = w ij, j = 1,, M; k = 1,, m. be the m The fractional weights for the final m imputed values are given by w ij0 = 1/m. Step 3 (Weighting Step): Modify the initial fractional weights w ij0 = 1/m to satisfy the calibration constraints. The constraint is w i m i A j=1 w ijs(ˆθ; x i, ỹ ij) = 0 (6) with m j=1 w ij = 1, where ˆθ is the pseudo MLE of θ computed from the FEFI step and ỹ ij = (y obs,i, ỹ (j) mis,i ). That is, ỹ ij is the j-th imputed element of y i selected from the PPS sampling in Step 2. A solution to this calibration problem is { } w ij = w ij0 w i S Ii0 ˆT 1 w ij0(s ij S Ii0) where i A Sij = S(ˆθ; x i, ỹij) S Ii0 = m ˆT = i A w ij0s ij j=1 m w i w ij j=1 ( S ij S Ii0 ) ( S ij S Ii0 ). Once the final fractional weights are computed, then the PFI estimator of η is obtained by solving w i m i A j=1 w iju(η; x i, ỹ ij) = 0. (7) 5
6 Note that the above fractionally imputed estimating equation is an approximation to the following expected estimating equation w i E{U(η; x i, y obs,i, Y mis,i ) x i, y obs,i ; ˆθ} = 0. i A For variance estimation, we can use replication methods (such as jackknife or bootstrap). Details are given in Appendix C. 3 Nonparametric approach: Fractional Hot deck Imputation for multivariate continuous variable We do not want to make a parametric model assumptions about f(y 1,, y p x). However, some assumption of joint distribution of (y 1,, y p ) is needed in order to preserve the correlation structure between the items. Easy if the data were categorical. Example (SRS of size n = 10) ID Weight x y 1 y y 1,1 y 2, y 1,2 M M y 2, y 1,4 y 2, y 1,5 y 2, y 1,6 y 2, M y 2, M M y 1,9 y 2, y 1,10 y 2,10 M: Missing Fractional Imputation Idea: If both y 1 and y 2 are categorical, then fractional imputation is easy to apply. We have only finite number of possible values. Imputed values = possible values 6
7 The fractional weights are the conditional probabilities of the possible values given the observations. Can use EM by weighting method of Ibrahim (1990) to compute the fractional weights. Example (y 1, y 2 : dichotomous, taking 0 or 1) ID Weight x y 1 y y 1,1 y 2, w2,1 1 y 1, w2,2 1 y 1, w3,1 1 0 y 2,3 0.10w3,2 1 1 y 2, y 1,4 y 2, y 1,5 y 2, y 1,6 y 2, w7,1 2 0 y 2,7 0.10w7,2 2 1 y 2, w8, w8, w8, w8, y 1,9 y 2, y 1,10 y 2,10 Fractional weights are the conditional probabilities of the imputed values given the observations. For example, w2,1 = ˆP (y 2 = 0 x = x 2, y 1 = y 1,2 ) w3,1 = ˆP (y 1 = 0 x = x 3, y 2 = y 3,2 ) w7,1 = ˆP (y 1 = 0 x = x 7, y 2 = y 2,7 ) and w 8,1 = ˆP (y 1 = 0, y 2 = 0 x = x 8 ). The conditional probabilities are computed from the joint probabilities. M-step: Update the joint probability π bc a = P (y 1 = b, y c = b x = a) by solving ˆπ bc a = n Mi i=1 j=1 w iwi,ji(x i = a, y (j) 1i = b, y (j) 2i = c) n i=1 w. ii(x i = a) 7
8 For continuous y, Let s consider an approximation using categorical transformation. For simplicity, let Y = (Y 1, Y 2, Y 3 ) be the study variables that have missingness. 1. Preliminary Step For each item k, create a transformation of Y k into Ỹk, a discrete version of Y k. The value of Ỹk will serve the role of imputation cell for Y k. If Y k is missing, then Ỹ k is also missing. Let M k be the number of cells for item Y k. The maximum number of cells for p = 3 is then G = M 1 M 2 M FEFI Step: Two-stage imputation is used. For each i in the sample, ỹ i is decomposed into ỹ i = (ỹ obs,i, ỹ mis,i ). In the stage 1 imputation, we impute the imputation cells. In the stage 2 imputation, we impute for missing observations within imputation cells. To perform two-stage imputation, we first compute the estimated joint probability π ijk = Pr (ỹ 1 = i, ỹ 2 = j, ỹ 3 = k) using the EM algorithm (or other estimation methods). (a) Stage 1 imputation: For each i, identify all possible values of ỹ mis,i. Let G i be the number of possible values of ỹ mis,i. In the stage 1 FEFI method, we create G i imputed values of ỹ mis,i with the fractional weights corresponding to ỹ mis,i(g) is w ig(1) = π(ỹ i,obs, ỹ i,mis(g) ) g π(ỹ i,obs, ỹ i,mis(g) ) (8) where ỹ i,mis(g) is the g-th realization of the ỹ i,mis, the missing part of Ỹ for unit i. (b) Stage 2 imputation: For each g-th imputed cell in Stage 1 imputation, we identify the donor set from the respondents of Y mis(i) to impute all the observed values of Y k in the same cell. For example, if we observe y 1i and y 3i but y 2i is not observed. In this case, the donor set for unit i is D i = {j A; δ 1j = δ 3j = 1, ỹ 1j = ỹ 1i, ỹ 3j = ỹ 3i }. The within-cell fractional 8
9 weight for donor j is then w ij(2) = w j j D i w j. The final fractional weight for donor j in unit i is w ij = w ig(1)w ij(2). (9) Note that j w ij = 1. Note that, for the fractional weight w ij, the imputed value for y i = (y obs,i, y mis,i ) is y ij = (y obs,i, y mis(i),j ), where y mis(i),j is the value corresponding to variable y mis(i) for unit j. 3. Sampling Step From the two stage FEFI data, we use a PPS sampling (with rejective sampling) and calibration weighting to obtain an approximation. That is, for each i, from the set {(w ij, y ij); j A}, we perform a systematic PPS sampling of size m using w ij in (9) as the size measure. Let y i1,, y im be the m imputed values from the PPS selection. The initial fractional weight assigned to y ij w ij0 = 1/m. 4. Weighting Step is given b The fractional weights are further adjusted to match to the marginal probabilities π i++, π +j+, and π ++k. That is, we may use w i {δ i I(ỹ i = g) + (1 δ i ) i A m j=1 w iji(ỹ ij = g) } = ˆπ g, g = 1,, G. In this case, a raking ratio estimation method can be used to achieve these calibration constraints. For variance estimation, we can use the replication method by repeating [Step 2]-[Step 4] to obtain replicated fractional weights. 9
10 Appendix A. Lemma 6.1 Lemma 6.1: If either (2) or (3) holds, the imputed estimator of Y = N i=1 y i of the form ŶI = i A w i{δ i y i +(1 δ i )y i } is unbiased for Y in the sense that E(ŶI Y ) = 0. Proof. We first introduce an extended definition of δ i, where δ i = 1 if unit i responds when sampled and δ i = 0 otherwise. With this extended definition, δ i is defined throughout the finite population. Fay (1992), Shao and Steel (1999) and Kim and Rao (2009) also used this extended definition. Now, we will first show that (3) implies unbiased. Let Ŷn = i A w iy i be an unbiased estimator of Y. Note that Ŷ I Ŷn = N I i w i (1 δ i )(yi y i ). (10) i=1 Thus, writing I = (I 1, I 2,, I N ) and δ = (δ 1,, δ N ), E{ŶI Ŷn I, δ} = N I i w i (1 δ i )E{yi y i I i = 1, δ i = 0} = 0. (11) i=1 Since we can write E{ŶI Y } = E{ŶI Ŷn} + E{Ŷn Y }, (12) the first term is zero by (11) and the second term is zero by the design-unbiasedness of Ŷn. Finally, we will show that condition (2) also implies unbiasedness. From (10), by taking expectation with respect to the sampling design, we have E{ŶI Ŷn δ, Y} = N (1 δ i ) (yi y i ) i=1 and so N E{ŶI Ŷn δ} = (1 δ i ) E (yi y i δ i = 0) = 0. i=1 Therefore, the unbiasedness of the imputed estimator also follows as the first term of (12) can be shown to be zero from (2). 10
11 B. SIR algorithm in the I-step To discuss I-step in Step 1, we give an illustration for p = 2. Extension to p > 2 case is straightforward. Note that the joint density can be written f(y 1, y 2 x) = f 1 (y 1 x; θ 1 )f 2 (y 2 x, y 1 ; θ 2 ). The sample is partitioned into four sets, A 11, A 10, A 01, A 00, according to the missing patterns. We first obtain the initial parameter estimate ˆθ (0) = (ˆθ 1(0), ˆθ 2(0) ) using available respondents. That is, we use the observations in A 11 A 10 to estimate θ 1 and use the observations in A 11 to estimate θ 2. Now, we want to generate M imputed values from (5). In the case of p = 2, we have the proposal distribution can be written as f 2 (y 2i x i, y 1i ; ˆθ 2(0) ) if i A 10 h(y mis,i x i, y obs,i ) = f(y 1i x i, y 2i ; ˆθ (0) ) if i A 01 f(y 1i, y 2i x i ; ˆθ (0) ) if i A 00 where f(y 1i x i, y 2i ; ˆθ (0) ) = f 1 (y 1i x i ; ˆθ 1(0) )f 2 (y 2i x i, y 1i ; ˆθ 2(0) ) f1 (y 1i x i ; ˆθ 1(0) )f 2 (y 2i x i, y 1i ; ˆθ 2(0) )dy 1i. (13) Except for some special cases such as normal f 1 and normal f 2, the conditional distribution in (13) is not of know form. Thus, some computational tools (such as Metropolis-Hastings algorithm) to generate samples from (13) for i A 01. We introduce SIR (Sampling Importance Resampling) algorithm as an alternative computational tool for generating imputed values from (13). The SIR consists of the following steps: 1. Generate B (san B = 100) samples from f 1 (y 1i x i ; ˆθ 1(0) ). 2. Select a PPS sample of size one from the B elements of y1i with size measure f 2 (y 2i x i, y1i; ˆθ 2(0) ). 3. Repeat Step 1 and Step 2 independently M times to obtain M imputed values. Once we obtain M imputed values of y 1i, we can use ĥ(y mis,i x i, y obs,i ) f 1 (y 1i x i ; ˆθ 1(0) )f 2 (y 2i x i, y 1i ; ˆθ 2(0) ) 11
12 as an estimator for the proposal density in (5). Since M j=1 w ij = 1, we do not need to compute the density for the conditional density in (13). C. Replication variance estimation In the variance estimation, we use a replication variance method. Let Ŷn = i A w iy i be the complete sample estimator of Y under complete response and let ˆV rep = n k=1 c k (Ŷ (k) n Ŷn ) 2 be a replication variance estimator with Ŷ (k) n = i A w(k) i y i. To discuss variance estimation of the PFI method presented in Section 2, recall that the PFI method consists of three steps: (1) FEFI step, (2) Sampling Step, (3) Weighting Step. We mimic the procedures for each replication but want to avoid regenerating the imputed values. The proposed variance estimation employs similar steps but uses the replication weights w (k) i instead of the original weight w i. 1. FEFI step: Compute the replicate ˆθ (k) of ˆθ by applying the same EM algorithm using w i replaced by w (k) i. 2. Sampling Step: We will use the same imputed data for each replication. The replicates for the fractional weights for the final m imputed values are given by w (k) ij0 f(y obs,i, ỹ (j) mis,i ; ˆθ (k) ) f(y obs,i, ỹ (j) mis,i ; ˆθ) (14) with m j=1 w (k) ij0 = Weighting step: Modify the initial fractional weights w (k) ij0 following calibration constraint in (14) to satisfy the i A w (k) i m j=1 w (k) ij S(ˆθ (k) ; x i, ỹ ij) = 0 with m j=1 w (k) ij = 1, where ˆθ (k) is the pseudo MLE of θ computed from the FEFI step using the replication weights. 12
13 Once the final replicated fractional weights are computed, then the variance estimation of ˆη P F I obtained from (7) is given by where ˆη (k) P F I is computed from ˆV P F I = n k=1 ( ) 2 c k ˆη (k) P F I ˆη P F I i A w (k) i m j=1 w (k) ij U(η; x i, ỹ ij) = 0. 13
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in
More informationChapter 4: Imputation
Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation
More informationTwo-phase sampling approach to fractional hot deck imputation
Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationRecent Advances in the analysis of missing data with non-ignorable missingness
Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation
More informationChapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70
Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationStatistical Methods for Handling Missing Data
Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationNonresponse weighting adjustment using estimated response probability
Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy
More informationOn the bias of the multiple-imputation variance estimator in survey sampling
J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,
More informationCombining Non-probability and Probability Survey Samples Through Mass Imputation
Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu
More informationData Integration for Big Data Analysis for finite population inference
for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation
More informationImputation for Missing Data under PPSWR Sampling
July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR
More informationMiscellanea A note on multiple imputation under complex sampling
Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationChapter 3: Element sampling design: Part 1
Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014 Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part
More informationChapter 8: Estimation 1
Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.
More informationCombining data from two independent surveys: model-assisted approach
Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,
More informationanalysis of incomplete data in statistical surveys
analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin
More informationCluster Sampling 2. Chapter Introduction
Chapter 7 Cluster Sampling 7.1 Introduction In this chapter, we consider two-stage cluster sampling where the sample clusters are selected in the first stage and the sample elements are selected in the
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationStatistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23
1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationPropensity score adjusted method for missing data
Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationCalibration estimation using exponential tilting in sample surveys
Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More informationAn Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys
An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction
More informationINSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING
Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationStatistics 135 Fall 2008 Final Exam
Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations
More informationBasics of Modern Missing Data Analysis
Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing
More informationAnalyzing Pilot Studies with Missing Observations
Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate
More informationSpecial Topic: Bayesian Finite Population Survey Sampling
Special Topic: Bayesian Finite Population Survey Sampling Sudipto Banerjee Division of Biostatistics School of Public Health University of Minnesota April 2, 2008 1 Special Topic Overview Scientific survey
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationLongitudinal analysis of ordinal data
Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More information6 Pattern Mixture Models
6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data
More informationVARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA
Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University
More informationIntroduction to Survey Data Integration
Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationA decision theoretic approach to Imputation in finite population sampling
A decision theoretic approach to Imputation in finite population sampling Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 August 1997 Revised May and November 1999 To appear
More informationFractional imputation method of handling missing data and spatial statistics
Graduate Theses and Dissertations Graduate College 2014 Fractional imputation method of handling missing data and spatial statistics Shu Yang Iowa State University Follow this and additional works at:
More informationPreliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference
1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement
More informationarxiv: v2 [math.st] 20 Jun 2014
A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More information5. Fractional Hot deck Imputation
5. Fractioal Hot deck Imputatio Itroductio Suppose that we are iterested i estimatig θ EY or eve θ 2 P ry < c where y fy x where x is always observed ad y is subject to missigess. Assume MAR i the sese
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationTopics and Papers for Spring 14 RIT
Eric Slud Feb. 3, 204 Topics and Papers for Spring 4 RIT The general topic of the RIT is inference for parameters of interest, such as population means or nonlinearregression coefficients, in the presence
More informationA measurement error model approach to small area estimation
A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion
More informationGraybill Conference Poster Session Introductions
Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary
More informationSmall area estimation with missing data using a multivariate linear random effects model
Department of Mathematics Small area estimation with missing data using a multivariate linear random effects model Innocent Ngaruye, Dietrich von Rosen and Martin Singull LiTH-MAT-R--2017/07--SE Department
More information2 Naïve Methods. 2.1 Complete or available case analysis
2 Naïve Methods Before discussing methods for taking account of missingness when the missingness pattern can be assumed to be MAR in the next three chapters, we review some simple methods for handling
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationPreliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference
1 / 172 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2014 2 / 172 Unpaid advertisement
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationWeb-based Supplementary Materials for Multilevel Latent Class Models with Dirichlet Mixing Distribution
Biometrics 000, 1 20 DOI: 000 000 0000 Web-based Supplementary Materials for Multilevel Latent Class Models with Dirichlet Mixing Distribution Chong-Zhi Di and Karen Bandeen-Roche *email: cdi@fhcrc.org
More informationEmpirical Likelihood Inference for Two-Sample Problems
Empirical Likelihood Inference for Two-Sample Problems by Ying Yan A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Statistics
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationJong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris
Jackknife Variance Estimation for Two Samples after Imputation under Two-Phase Sampling Jong-Min Kim* and Jon E. Anderson jongmink@mrs.umn.edu Statistics Discipline Division of Science and Mathematics
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationStatistical Analysis of List Experiments
Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys
More informationEmpirical Likelihood Methods for Sample Survey Data: An Overview
AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use
More informationBayesian Nonparametric Rasch Modeling: Methods and Software
Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationFractional hot deck imputation
Biometrika (2004), 91, 3, pp. 559 578 2004 Biometrika Trust Printed in Great Britain Fractional hot deck imputation BY JAE KWANG KM Department of Applied Statistics, Yonsei University, Seoul, 120-749,
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationMixtures of Rasch Models
Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters
More informationWeighting in survey analysis under informative sampling
Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationA Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java
IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java To cite this
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationChapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling
Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st )
More informationANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW
SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved
More informationEmpirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design
1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary
More information7 Sensitivity Analysis
7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption
More informationOptimal Auxiliary Variable Assisted Two-Phase Sampling Designs
MASTER S THESIS Optimal Auxiliary Variable Assisted Two-Phase Sampling Designs HENRIK IMBERG Department of Mathematical Sciences Division of Mathematical Statistics CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY
More informationMultiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models
Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models Hui Xie Assistant Professor Division of Epidemiology & Biostatistics UIC This is a joint work with Drs. Hua Yun
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More information. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)
Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationChapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28
Chapter 4 Replication Variance Estimation J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Jackknife Variance Estimation Create a new sample by deleting one observation n 1 n n ( x (k) x) 2 = x (k) = n
More informationEmpirical Likelihood Methods
Handbook of Statistics, Volume 29 Sample Surveys: Theory, Methods and Inference Empirical Likelihood Methods J.N.K. Rao and Changbao Wu (February 14, 2008, Final Version) 1 Likelihood-based Approaches
More informationMissing Data: Theory & Methods. Introduction
STAT 992/BMI 826 University of Wisconsin-Madison Missing Data: Theory & Methods Chapter 1 Introduction Lu Mao lmao@biostat.wisc.edu 1-1 Introduction Statistical analysis with missing data is a rich and
More information