6. Fractional Imputation in Survey Sampling

Size: px
Start display at page:

Download "6. Fractional Imputation in Survey Sampling"

Transcription

1 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population are study variables, z i = (x i, y i ) with y i = (y i1,, y ip ) being the vector of study variables that are subject to missingness and x i being the vector of auxiliary variables that are always observed. We are interested in estimating η, defined as a (unique) solution to the population estimating equation N i=1 U(η; z i) = 0. Examples of η includes 1. Population mean: U(η; x, y) = y η 2. Population proportion of Y less than q: U(η; x, y) = I(y < q) η 3. Population p-th quantitle : U(η; x, y) = I(y < η) p 4. Population regression coefficient: U(η; x, y) = (y xη)x 5. Domain mean: U(η; x, y) = (y η)d(x) Let A denote the set of indices for the units in a sample selected by a probability sampling mechanism with sample size n. Under complete response, a consistent estimator of η is obtained by solving w i U(η; z i ) = 0. (1) i A where w i = 1/π i is the inverse of the first-order inclusion probability. Under some regularity conditions, we can establish that the solution ˆη n to (1) converges in probability to η and is asymptotically normally distributed. 1

2 Things to consider Complex sampling: unequal probability of selection, multi-stage sampling General purpose estimation: We do not know which parameter η will be used by data analyst at the time of imputation. Multivariate missingness with arbitrary missing pattern. We cannot use large M. (We do not want to create a huge data file.) Two concepts of missing at random (MAR): (For simplicity, assume p = 1.) Population missing at random (PMAR): MAR holds in the population level f(y x) = f(y x, δ) That is, Y δ x. Sample missing at random (SMAR): MAR holds in the sample level f(y x, I = 1) = f(y x, I = 1, δ) That is, Y δ (x, I = 1), where I i = 1 if unit i A and I i = 0 otherwise. If the sampling design is such that P (I = 1 x, y) = P (I = 1 x), which is often called noninformative sampling design, then PMAR implies SMAR. For informative sampling design, PMAR does not necessarily imply SMAR. Under PMAR, an imputed value y i of missing y i satisfies while, under SMAR, it satisfies E{y i x i, δ i = 0} = E{y i x i, δ i = 0} (2) E{y i x i, I i = 1, δ i = 0} = E{y i x i, δ i = 0}. (3) Roughly speaking, fractional imputation is based on PMAR assumption, while multiple imputation (which will be covered next week) is based on SMAR assumption. 2

3 2 Parametric fractional imputation We assume that the finite population at hand is a realization from an infinite population, called superpopulation. In the superpopulation model, we often postulate a parametric distribution, f(y x; θ), which is known up to the parameter θ with parameter space Ω. The parametric model has a joint density f(y x; θ) = f 1 (y 1 x; θ 1 )f 2 (y 2 x, y 1 ; θ 2 ) f p (y p x, y 1,, y p 1 ; θ p ) (4) where θ k is the parameter in the conditional distribution of y k given x and y 1,, y k 1. For each y i = (y 1i,, y pi ), we have δ i = (δ 1i,, δ pi ) where δ ki = 1 if y ki is observed and δ ki = 0 otherwise. For example, p = 3, there are 8 = 2 3 possible missing patterns: A = A 111 A 110 A 000, where, for example, A 100 is the set of sample indices with δ 1i = 1, δ 2i = 0, and δ 3i = 0. For i A 100, we need to create imputed values for y 2i and y 3i from f(y 2i, y 3i x i, y 1i ). Without loss of generality, we can express y i = (y obs,i, y mis,i ), where y obs,i and y mis,i are the observed and the missing part of y i, respectively. Three steps for PFI under complex sampling 1. Compute the pseudo maximum likelihood estimator of θ using EM by PFI method with sufficiently large imputation size M (say, M = 1, 000). 2. Select m (say, m = 10) imputed values from the set of M imputed values. 3. Construct the final fractional weights for the m imputed data. The first step is called Fully Efficient Fractional Imputation (FEFI) step. The second step is Sampling Step, The third step is called Weighting Step. Step 1 (FEFI step): The pseudo maximum likelihood estimator of θ is computed by the following EM algorithm. 3

4 1. [I-step]: Set t = 0. Obtain M imputed values of y mis,i generated from a proposal distribution h(y mis,i x i, y obs,i ). One simple choice of h( ) is h(y mis,i x i, y obs,i ) = f(y mis,i x i, y obs,i ; ˆθ (0) ) (5) where ˆθ (0) is the initial estimator of θ obtained from the available respondents. Generating samples from (5) may require MCMC method or SIR (Sampling Importance Resampling) method. See Appendix B for an illustration of SIR method. Let w ij(0) for y (j) mis,i. = 1/M be the initial fractional weights 2. [M-step]: Update the parameter ˆθ (t+1) by solving the following imputed score equation, ˆθ (t+1) : solution to i A w i M j=1 w ij(t)s(θ; x i, y ij) = 0, where yij = (y obs,i, y (j) mis,i ) and S(θ; x, y) = log f(y x; θ)/ θ is the score function of θ. 3. [W-step]: Set t = t + 1. Using the current value of the parameter estimates ˆθ (t), compute the fractional weights w ij(t) f(y obs,i, y (j) mis,i x i; ˆθ (t) ) h(y (j) mis,i x i, y obs,i ) with M j=1 w ij(t) = 1. For t = 0, we set w ij(t) = 1/M. 4. Check if w ij(t) > 1/m for some j = 1,, M. If yes, update the proposal distribution with ˆθ 0 replaced by ˆθ (t) and goto [I-step]. If no, goto [M-step]. Stop if ˆθ (t) meets the convergence criterion. [I-step] is the imputation step, [W-step] is the weighting step, and [M-step] is the maximization step. Note that the imputed values are not changed in the EM iteration. Only the fractional weights are updated. Step 2 (Sampling Step): For each i, we have M possible imputed values z (j) i = (x i, y obs,i, y (j) mis,i ) with their fractional weights w ij, where w ij is computed from 4

5 the EM algorithm after convergence. For each i, we treat z i = {z (j) i ; j = 1, 2,, M} as as a weighted finite population (with weight w ij) and use an unequal probability sampling method to select a sample of size m from z i using w ij as the selection probability. (We can use a PPS sampling or systematic πps sampling to obtain an imputed data of size m.) Let z (1) i,, z (m) i elements sampled from z i by the PPS sampling. That is, P r( z (k) i = z (j) i ) = w ij, j = 1,, M; k = 1,, m. be the m The fractional weights for the final m imputed values are given by w ij0 = 1/m. Step 3 (Weighting Step): Modify the initial fractional weights w ij0 = 1/m to satisfy the calibration constraints. The constraint is w i m i A j=1 w ijs(ˆθ; x i, ỹ ij) = 0 (6) with m j=1 w ij = 1, where ˆθ is the pseudo MLE of θ computed from the FEFI step and ỹ ij = (y obs,i, ỹ (j) mis,i ). That is, ỹ ij is the j-th imputed element of y i selected from the PPS sampling in Step 2. A solution to this calibration problem is { } w ij = w ij0 w i S Ii0 ˆT 1 w ij0(s ij S Ii0) where i A Sij = S(ˆθ; x i, ỹij) S Ii0 = m ˆT = i A w ij0s ij j=1 m w i w ij j=1 ( S ij S Ii0 ) ( S ij S Ii0 ). Once the final fractional weights are computed, then the PFI estimator of η is obtained by solving w i m i A j=1 w iju(η; x i, ỹ ij) = 0. (7) 5

6 Note that the above fractionally imputed estimating equation is an approximation to the following expected estimating equation w i E{U(η; x i, y obs,i, Y mis,i ) x i, y obs,i ; ˆθ} = 0. i A For variance estimation, we can use replication methods (such as jackknife or bootstrap). Details are given in Appendix C. 3 Nonparametric approach: Fractional Hot deck Imputation for multivariate continuous variable We do not want to make a parametric model assumptions about f(y 1,, y p x). However, some assumption of joint distribution of (y 1,, y p ) is needed in order to preserve the correlation structure between the items. Easy if the data were categorical. Example (SRS of size n = 10) ID Weight x y 1 y y 1,1 y 2, y 1,2 M M y 2, y 1,4 y 2, y 1,5 y 2, y 1,6 y 2, M y 2, M M y 1,9 y 2, y 1,10 y 2,10 M: Missing Fractional Imputation Idea: If both y 1 and y 2 are categorical, then fractional imputation is easy to apply. We have only finite number of possible values. Imputed values = possible values 6

7 The fractional weights are the conditional probabilities of the possible values given the observations. Can use EM by weighting method of Ibrahim (1990) to compute the fractional weights. Example (y 1, y 2 : dichotomous, taking 0 or 1) ID Weight x y 1 y y 1,1 y 2, w2,1 1 y 1, w2,2 1 y 1, w3,1 1 0 y 2,3 0.10w3,2 1 1 y 2, y 1,4 y 2, y 1,5 y 2, y 1,6 y 2, w7,1 2 0 y 2,7 0.10w7,2 2 1 y 2, w8, w8, w8, w8, y 1,9 y 2, y 1,10 y 2,10 Fractional weights are the conditional probabilities of the imputed values given the observations. For example, w2,1 = ˆP (y 2 = 0 x = x 2, y 1 = y 1,2 ) w3,1 = ˆP (y 1 = 0 x = x 3, y 2 = y 3,2 ) w7,1 = ˆP (y 1 = 0 x = x 7, y 2 = y 2,7 ) and w 8,1 = ˆP (y 1 = 0, y 2 = 0 x = x 8 ). The conditional probabilities are computed from the joint probabilities. M-step: Update the joint probability π bc a = P (y 1 = b, y c = b x = a) by solving ˆπ bc a = n Mi i=1 j=1 w iwi,ji(x i = a, y (j) 1i = b, y (j) 2i = c) n i=1 w. ii(x i = a) 7

8 For continuous y, Let s consider an approximation using categorical transformation. For simplicity, let Y = (Y 1, Y 2, Y 3 ) be the study variables that have missingness. 1. Preliminary Step For each item k, create a transformation of Y k into Ỹk, a discrete version of Y k. The value of Ỹk will serve the role of imputation cell for Y k. If Y k is missing, then Ỹ k is also missing. Let M k be the number of cells for item Y k. The maximum number of cells for p = 3 is then G = M 1 M 2 M FEFI Step: Two-stage imputation is used. For each i in the sample, ỹ i is decomposed into ỹ i = (ỹ obs,i, ỹ mis,i ). In the stage 1 imputation, we impute the imputation cells. In the stage 2 imputation, we impute for missing observations within imputation cells. To perform two-stage imputation, we first compute the estimated joint probability π ijk = Pr (ỹ 1 = i, ỹ 2 = j, ỹ 3 = k) using the EM algorithm (or other estimation methods). (a) Stage 1 imputation: For each i, identify all possible values of ỹ mis,i. Let G i be the number of possible values of ỹ mis,i. In the stage 1 FEFI method, we create G i imputed values of ỹ mis,i with the fractional weights corresponding to ỹ mis,i(g) is w ig(1) = π(ỹ i,obs, ỹ i,mis(g) ) g π(ỹ i,obs, ỹ i,mis(g) ) (8) where ỹ i,mis(g) is the g-th realization of the ỹ i,mis, the missing part of Ỹ for unit i. (b) Stage 2 imputation: For each g-th imputed cell in Stage 1 imputation, we identify the donor set from the respondents of Y mis(i) to impute all the observed values of Y k in the same cell. For example, if we observe y 1i and y 3i but y 2i is not observed. In this case, the donor set for unit i is D i = {j A; δ 1j = δ 3j = 1, ỹ 1j = ỹ 1i, ỹ 3j = ỹ 3i }. The within-cell fractional 8

9 weight for donor j is then w ij(2) = w j j D i w j. The final fractional weight for donor j in unit i is w ij = w ig(1)w ij(2). (9) Note that j w ij = 1. Note that, for the fractional weight w ij, the imputed value for y i = (y obs,i, y mis,i ) is y ij = (y obs,i, y mis(i),j ), where y mis(i),j is the value corresponding to variable y mis(i) for unit j. 3. Sampling Step From the two stage FEFI data, we use a PPS sampling (with rejective sampling) and calibration weighting to obtain an approximation. That is, for each i, from the set {(w ij, y ij); j A}, we perform a systematic PPS sampling of size m using w ij in (9) as the size measure. Let y i1,, y im be the m imputed values from the PPS selection. The initial fractional weight assigned to y ij w ij0 = 1/m. 4. Weighting Step is given b The fractional weights are further adjusted to match to the marginal probabilities π i++, π +j+, and π ++k. That is, we may use w i {δ i I(ỹ i = g) + (1 δ i ) i A m j=1 w iji(ỹ ij = g) } = ˆπ g, g = 1,, G. In this case, a raking ratio estimation method can be used to achieve these calibration constraints. For variance estimation, we can use the replication method by repeating [Step 2]-[Step 4] to obtain replicated fractional weights. 9

10 Appendix A. Lemma 6.1 Lemma 6.1: If either (2) or (3) holds, the imputed estimator of Y = N i=1 y i of the form ŶI = i A w i{δ i y i +(1 δ i )y i } is unbiased for Y in the sense that E(ŶI Y ) = 0. Proof. We first introduce an extended definition of δ i, where δ i = 1 if unit i responds when sampled and δ i = 0 otherwise. With this extended definition, δ i is defined throughout the finite population. Fay (1992), Shao and Steel (1999) and Kim and Rao (2009) also used this extended definition. Now, we will first show that (3) implies unbiased. Let Ŷn = i A w iy i be an unbiased estimator of Y. Note that Ŷ I Ŷn = N I i w i (1 δ i )(yi y i ). (10) i=1 Thus, writing I = (I 1, I 2,, I N ) and δ = (δ 1,, δ N ), E{ŶI Ŷn I, δ} = N I i w i (1 δ i )E{yi y i I i = 1, δ i = 0} = 0. (11) i=1 Since we can write E{ŶI Y } = E{ŶI Ŷn} + E{Ŷn Y }, (12) the first term is zero by (11) and the second term is zero by the design-unbiasedness of Ŷn. Finally, we will show that condition (2) also implies unbiasedness. From (10), by taking expectation with respect to the sampling design, we have E{ŶI Ŷn δ, Y} = N (1 δ i ) (yi y i ) i=1 and so N E{ŶI Ŷn δ} = (1 δ i ) E (yi y i δ i = 0) = 0. i=1 Therefore, the unbiasedness of the imputed estimator also follows as the first term of (12) can be shown to be zero from (2). 10

11 B. SIR algorithm in the I-step To discuss I-step in Step 1, we give an illustration for p = 2. Extension to p > 2 case is straightforward. Note that the joint density can be written f(y 1, y 2 x) = f 1 (y 1 x; θ 1 )f 2 (y 2 x, y 1 ; θ 2 ). The sample is partitioned into four sets, A 11, A 10, A 01, A 00, according to the missing patterns. We first obtain the initial parameter estimate ˆθ (0) = (ˆθ 1(0), ˆθ 2(0) ) using available respondents. That is, we use the observations in A 11 A 10 to estimate θ 1 and use the observations in A 11 to estimate θ 2. Now, we want to generate M imputed values from (5). In the case of p = 2, we have the proposal distribution can be written as f 2 (y 2i x i, y 1i ; ˆθ 2(0) ) if i A 10 h(y mis,i x i, y obs,i ) = f(y 1i x i, y 2i ; ˆθ (0) ) if i A 01 f(y 1i, y 2i x i ; ˆθ (0) ) if i A 00 where f(y 1i x i, y 2i ; ˆθ (0) ) = f 1 (y 1i x i ; ˆθ 1(0) )f 2 (y 2i x i, y 1i ; ˆθ 2(0) ) f1 (y 1i x i ; ˆθ 1(0) )f 2 (y 2i x i, y 1i ; ˆθ 2(0) )dy 1i. (13) Except for some special cases such as normal f 1 and normal f 2, the conditional distribution in (13) is not of know form. Thus, some computational tools (such as Metropolis-Hastings algorithm) to generate samples from (13) for i A 01. We introduce SIR (Sampling Importance Resampling) algorithm as an alternative computational tool for generating imputed values from (13). The SIR consists of the following steps: 1. Generate B (san B = 100) samples from f 1 (y 1i x i ; ˆθ 1(0) ). 2. Select a PPS sample of size one from the B elements of y1i with size measure f 2 (y 2i x i, y1i; ˆθ 2(0) ). 3. Repeat Step 1 and Step 2 independently M times to obtain M imputed values. Once we obtain M imputed values of y 1i, we can use ĥ(y mis,i x i, y obs,i ) f 1 (y 1i x i ; ˆθ 1(0) )f 2 (y 2i x i, y 1i ; ˆθ 2(0) ) 11

12 as an estimator for the proposal density in (5). Since M j=1 w ij = 1, we do not need to compute the density for the conditional density in (13). C. Replication variance estimation In the variance estimation, we use a replication variance method. Let Ŷn = i A w iy i be the complete sample estimator of Y under complete response and let ˆV rep = n k=1 c k (Ŷ (k) n Ŷn ) 2 be a replication variance estimator with Ŷ (k) n = i A w(k) i y i. To discuss variance estimation of the PFI method presented in Section 2, recall that the PFI method consists of three steps: (1) FEFI step, (2) Sampling Step, (3) Weighting Step. We mimic the procedures for each replication but want to avoid regenerating the imputed values. The proposed variance estimation employs similar steps but uses the replication weights w (k) i instead of the original weight w i. 1. FEFI step: Compute the replicate ˆθ (k) of ˆθ by applying the same EM algorithm using w i replaced by w (k) i. 2. Sampling Step: We will use the same imputed data for each replication. The replicates for the fractional weights for the final m imputed values are given by w (k) ij0 f(y obs,i, ỹ (j) mis,i ; ˆθ (k) ) f(y obs,i, ỹ (j) mis,i ; ˆθ) (14) with m j=1 w (k) ij0 = Weighting step: Modify the initial fractional weights w (k) ij0 following calibration constraint in (14) to satisfy the i A w (k) i m j=1 w (k) ij S(ˆθ (k) ; x i, ỹ ij) = 0 with m j=1 w (k) ij = 1, where ˆθ (k) is the pseudo MLE of θ computed from the FEFI step using the replication weights. 12

13 Once the final replicated fractional weights are computed, then the variance estimation of ˆη P F I obtained from (7) is given by where ˆη (k) P F I is computed from ˆV P F I = n k=1 ( ) 2 c k ˆη (k) P F I ˆη P F I i A w (k) i m j=1 w (k) ij U(η; x i, ỹ ij) = 0. 13

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

Statistical Methods for Handling Missing Data

Statistical Methods for Handling Missing Data Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Imputation for Missing Data under PPSWR Sampling

Imputation for Missing Data under PPSWR Sampling July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR

More information

Miscellanea A note on multiple imputation under complex sampling

Miscellanea A note on multiple imputation under complex sampling Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Chapter 3: Element sampling design: Part 1

Chapter 3: Element sampling design: Part 1 Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014 Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

Cluster Sampling 2. Chapter Introduction

Cluster Sampling 2. Chapter Introduction Chapter 7 Cluster Sampling 7.1 Introduction In this chapter, we consider two-stage cluster sampling where the sample clusters are selected in the first stage and the sample elements are selected in the

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Propensity score adjusted method for missing data

Propensity score adjusted method for missing data Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

Analyzing Pilot Studies with Missing Observations

Analyzing Pilot Studies with Missing Observations Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate

More information

Special Topic: Bayesian Finite Population Survey Sampling

Special Topic: Bayesian Finite Population Survey Sampling Special Topic: Bayesian Finite Population Survey Sampling Sudipto Banerjee Division of Biostatistics School of Public Health University of Minnesota April 2, 2008 1 Special Topic Overview Scientific survey

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Longitudinal analysis of ordinal data

Longitudinal analysis of ordinal data Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University

More information

Introduction to Survey Data Integration

Introduction to Survey Data Integration Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

A decision theoretic approach to Imputation in finite population sampling

A decision theoretic approach to Imputation in finite population sampling A decision theoretic approach to Imputation in finite population sampling Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 August 1997 Revised May and November 1999 To appear

More information

Fractional imputation method of handling missing data and spatial statistics

Fractional imputation method of handling missing data and spatial statistics Graduate Theses and Dissertations Graduate College 2014 Fractional imputation method of handling missing data and spatial statistics Shu Yang Iowa State University Follow this and additional works at:

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

5. Fractional Hot deck Imputation

5. Fractional Hot deck Imputation 5. Fractioal Hot deck Imputatio Itroductio Suppose that we are iterested i estimatig θ EY or eve θ 2 P ry < c where y fy x where x is always observed ad y is subject to missigess. Assume MAR i the sese

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Topics and Papers for Spring 14 RIT

Topics and Papers for Spring 14 RIT Eric Slud Feb. 3, 204 Topics and Papers for Spring 4 RIT The general topic of the RIT is inference for parameters of interest, such as population means or nonlinearregression coefficients, in the presence

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Graybill Conference Poster Session Introductions

Graybill Conference Poster Session Introductions Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary

More information

Small area estimation with missing data using a multivariate linear random effects model

Small area estimation with missing data using a multivariate linear random effects model Department of Mathematics Small area estimation with missing data using a multivariate linear random effects model Innocent Ngaruye, Dietrich von Rosen and Martin Singull LiTH-MAT-R--2017/07--SE Department

More information

2 Naïve Methods. 2.1 Complete or available case analysis

2 Naïve Methods. 2.1 Complete or available case analysis 2 Naïve Methods Before discussing methods for taking account of missingness when the missingness pattern can be assumed to be MAR in the next three chapters, we review some simple methods for handling

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 172 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2014 2 / 172 Unpaid advertisement

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Web-based Supplementary Materials for Multilevel Latent Class Models with Dirichlet Mixing Distribution

Web-based Supplementary Materials for Multilevel Latent Class Models with Dirichlet Mixing Distribution Biometrics 000, 1 20 DOI: 000 000 0000 Web-based Supplementary Materials for Multilevel Latent Class Models with Dirichlet Mixing Distribution Chong-Zhi Di and Karen Bandeen-Roche *email: cdi@fhcrc.org

More information

Empirical Likelihood Inference for Two-Sample Problems

Empirical Likelihood Inference for Two-Sample Problems Empirical Likelihood Inference for Two-Sample Problems by Ying Yan A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Statistics

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris Jackknife Variance Estimation for Two Samples after Imputation under Two-Phase Sampling Jong-Min Kim* and Jon E. Anderson jongmink@mrs.umn.edu Statistics Discipline Division of Science and Mathematics

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

Bayesian Nonparametric Rasch Modeling: Methods and Software

Bayesian Nonparametric Rasch Modeling: Methods and Software Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Fractional hot deck imputation

Fractional hot deck imputation Biometrika (2004), 91, 3, pp. 559 578 2004 Biometrika Trust Printed in Great Britain Fractional hot deck imputation BY JAE KWANG KM Department of Applied Statistics, Yonsei University, Seoul, 120-749,

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Mixtures of Rasch Models

Mixtures of Rasch Models Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java

A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java To cite this

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st )

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

Optimal Auxiliary Variable Assisted Two-Phase Sampling Designs

Optimal Auxiliary Variable Assisted Two-Phase Sampling Designs MASTER S THESIS Optimal Auxiliary Variable Assisted Two-Phase Sampling Designs HENRIK IMBERG Department of Mathematical Sciences Division of Mathematical Statistics CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY

More information

Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models

Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models Hui Xie Assistant Professor Division of Epidemiology & Biostatistics UIC This is a joint work with Drs. Hua Yun

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Chapter 4 Replication Variance Estimation J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Jackknife Variance Estimation Create a new sample by deleting one observation n 1 n n ( x (k) x) 2 = x (k) = n

More information

Empirical Likelihood Methods

Empirical Likelihood Methods Handbook of Statistics, Volume 29 Sample Surveys: Theory, Methods and Inference Empirical Likelihood Methods J.N.K. Rao and Changbao Wu (February 14, 2008, Final Version) 1 Likelihood-based Approaches

More information

Missing Data: Theory & Methods. Introduction

Missing Data: Theory & Methods. Introduction STAT 992/BMI 826 University of Wisconsin-Madison Missing Data: Theory & Methods Chapter 1 Introduction Lu Mao lmao@biostat.wisc.edu 1-1 Introduction Statistical analysis with missing data is a rich and

More information