Statistical Methods for Handling Missing Data

Size: px
Start display at page:

Download "Statistical Methods for Handling Missing Data"

Transcription

1 Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014

2 Outline Textbook : Statistical Methods for handling incomplete data by Kim and Shao (2013) Part 1: Basic Theory (Chapter 2-3) Part 2: Imputation (Chapter 4) Part 3: Propensity score approach (Chapter 5) Part 4: Nonignorable missing (Chapter 6) Jae-Kwang Kim (ISU) July 5th, / 181

3 Statistical Methods for Handling Missing Data Part 1: Basic Theory Jae-Kwang Kim Department of Statistics, Iowa State University

4 1 Introduction Definitions for likelihood theory The likelihood function of θ is defined as where f (y; θ) is the joint pdf of y. L(θ) = f (y; θ) Let ˆθ be the maximum likelihood estimator (MLE) of θ 0 if it satisfies L(ˆθ) = max θ Θ L(θ). A parametric family of densities, P = {f (y; θ); θ Θ}, is identifiable if for all y, f (y; θ 1) f (y; θ 2) for every θ 1 θ 2. Jae-Kwang Kim (ISU) Part 1 4 / 181

5 1 Introduction - Fisher information Definition 1 Score function: S(θ) = log L(θ) θ 2 Fisher information = curvature of the log-likelihood: I(θ) = 2 log L(θ) = θ θt θ S(θ) T 3 Observed (Fisher) information: I(ˆθ n) where ˆθ n is the MLE. 4 Expected (Fisher) information: I(θ) = E θ {I(θ)} The observed information is always positive. The observed information applies to a single dataset. The expected information is meaningful as a function of θ across the admissible values of θ. The expected information is an average quantity over all possible datasets. I(ˆθ) = I(ˆθ) for exponential family. Jae-Kwang Kim (ISU) Part 1 5 / 181

6 1 Introduction - Fisher information Lemma 1. Properties of score functions [Theorem 2.3 of KS] Under regularity conditions allowing the exchange of the order of integration and differentiation, E θ {S(θ)} = 0 and V θ {S(θ)} = I(θ). Jae-Kwang Kim (ISU) Part 1 6 / 181

7 1 Introduction - Fisher information Remark Under some regularity conditions, the MLE ˆθ converges in probability to the true parameter θ 0. Thus, we can apply a Taylor linearization on S(ˆθ) = 0 to get ˆθ θ 0 = {I(θ0)} 1 S(θ 0). Here, we use the fact that I (θ) = S(θ)/ θ T converges in probability to I(θ). Thus, the (asymptotic) variance of MLE is V (ˆθ). = {I(θ 0)} 1 V {S(θ 0)} {I(θ 0)} 1 = {I(θ 0)} 1, where the last equality follows from Lemma 1. Jae-Kwang Kim (ISU) Part 1 7 / 181

8 2 Observed Likelihood Basic Setup Let y = (y 1,..., y p) be a p-dimensional random vector with probability density function f (y; θ) whose dominating measure is µ. { 1 if yij is observed Let δ ij be the response indicator function of y ij with δ ij = 0 otherwise. δ i = (δ i1,, δ ip ): p-dimensional random vector with density P(δ y) assuming P(δ y) = P(δ y; φ) for some φ. Let (y i,obs, y i,mis ) be the observed part and missing part of y i, respectively. Let R(y obs, δ) = {y; y obs (y i, δ i ) = y i,obs, i = 1,..., n} be the set of all possible values of y with the same realized value of y obs, for given δ, where y obs (y i, δ i ) is a function that gives the value of y ij for δ ij = 1. Jae-Kwang Kim (ISU) Part 1 8 / 181

9 2 Observed Likelihood Definition: Observed likelihood Under the above setup, the observed likelihood of (θ, φ) is L obs (θ, φ) = f (y; θ)p(δ y; φ)dµ(y). R(y obs,δ) Under IID setup: The observed likelihood is n [ ] L obs (θ, φ) = f (y i ; θ)p(δ i y i ; φ)dµ(y i,mis ), where it is understood that, if y i = y i,obs and y i,mis is empty then there is nothing to integrate out. In the special case of scalar y, the observed likelihood is L obs (θ, φ) = [f (y i ; θ)π(y i ; φ)] [ ] f (y; θ) {1 π(y; φ)} dy, δ i =1 δ i =0 where π(y; φ) = P(δ = 1 y; φ). Jae-Kwang Kim (ISU) Part 1 9 / 181

10 2 Observed Likelihood Example 1 [Example 2.3 of KS] Let t 1, t 2,, t n be an IID sample from a distribution with density f θ (t) = θe θt I (t > 0). Instead of observing t i, we observe (y i, δ i ) where { ti if δ i = 1 y i = c if δ i = 0 and δ i = { 1 if ti c 0 if t i > c, where c is a known censoring time. The observed likelihood for θ can be derived as L obs (θ) = n [ ] {f θ (t i )} δ i {P (t i > c)} 1 δ i = θ n δ i exp( θ n y i ). Jae-Kwang Kim (ISU) Part 1 10 / 181

11 2 Observed Likelihood Definition: Missing At Random (MAR) P(δ y) is the density of the conditional distribution of δ given y. Let y obs = y obs (y, δ) where { yi if δ i = 1 y i,obs = if δ i = 0. The response mechanism is MAR if P (δ y 1 ) = P (δ y 2 ) { or P(δ y) = P(δ y obs )} for all y 1 and y 2 satisfying y obs (y 1, δ) = y obs (y 2, δ). MAR: the response mechanism P(δ y) depends on y only through y obs. Let y = (y obs, y mis ). By Bayes theorem, P (y mis y obs, δ) = P(δ y mis, y obs ) P (y P(δ y obs ) mis y obs ). MAR: P(y mis y obs, δ) = P(y mis y obs ). That is, y mis δ y obs. MAR: the conditional independence of δ and y mis given y obs. Jae-Kwang Kim (ISU) Part 1 11 / 181

12 2 Observed Likelihood Remark MCAR (Missing Completely at random): P(δ y) does not depend on y. MAR (Missing at random): P(δ y) = P(δ y obs ) NMAR (Not Missing at random): P(δ y) P(δ y obs ) Thus, MCAR is a special case of MAR. Jae-Kwang Kim (ISU) Part 1 12 / 181

13 2 Observed Likelihood Theorem 1: Likelihood factorization (Rubin, 1976) [Theorem 2.4 of KS] P φ (δ y) is the joint density of δ given y and f θ (y) is the joint density of y. Under conditions 1 the parameters θ and φ are distinct and 2 MAR condition holds, the observed likelihood can be written as L obs (θ, φ) = L 1(θ)L 2(φ), and the MLE of θ can be obtained by maximizing L 1(θ). Thus, we do not have to specify the model for response mechanism. The response mechanism is called ignorable if the above likelihood factorization holds. Jae-Kwang Kim (ISU) Part 1 13 / 181

14 2 Observed Likelihood Example 2 [Example 2.4 of KS] Bivariate data (x i, y i ) with pdf f (x, y) = f 1(y x)f 2(x). x i is always observed and y i is subject to missingness. Assume that the response status variable δ i of y i satisfies for some function Λ 1 ( ) of known form. P (δ i = 1 x i, y i ) = Λ 1 (φ 0 + φ 1x i + φ 2y i ) Let θ be the parameter of interest in the regression model f 1 (y x; θ). Let α be the parameter in the marginal distribution of x, denoted by f 2 (x i ; α). Define Λ 0(x) = 1 Λ 1(x). Three parameters θ: parameter of interest α and φ: nuisance parameter Jae-Kwang Kim (ISU) Part 1 14 / 181

15 2 Observed Likelihood Example 2 (Cont d) Observed likelihood L obs (θ, α, φ) = f 1 (y i x i ; θ) f 2 (x i ; α) Λ 1 (φ 0 + φ 1x i + φ 2y i ) δi =1 δi =0 f 1 (y x i ; θ) f 2 (x i ; α) Λ 0 (φ 0 + φ 1x i + φ 2y) dy = L 1 (θ, φ) L 2 (α) where L 2 (α) = n f2 (x i; α). Thus, we can safely ignore the marginal distribution of x if x is completely observed. Jae-Kwang Kim (ISU) Part 1 15 / 181

16 2 Observed Likelihood Example 2 (Cont d) If φ 2 = 0, then MAR holds and L 1 (θ, φ) = L 1a (θ) L 1b (φ) where and L 1a (θ) = f 1 (y i x i ; θ) δ i =1 L 1b (φ) = Λ 1 (φ 0 + φ 1x i ) Λ 0 (φ 0 + φ 1x i ). δ i =1 δ i =0 Thus, under MAR, the MLE of θ can be obtained by maximizing L 1a (θ), which is obtained by ignoring the missing part of the data. Jae-Kwang Kim (ISU) Part 1 16 / 181

17 2 Observed Likelihood Example 2 (Cont d) Instead of y i subject to missingness, if x i is subject to missingness, then the observed likelihood becomes L obs (θ, φ, α) = f 1 (y i x i ; θ) f 2 (x i ; α) Λ 1 (φ 0 + φ 1x i + φ 2y i ) δi =1 δi =0 f 1 (y i x; θ) f 2 (x; α) Λ 0 (φ 0 + φ 1x + φ 2y i ) dx If φ 1 = 0 then L 1 (θ, φ) L 2 (α). L obs (θ, α, φ) = L 1 (θ, α) L 2 (φ) and MAR holds. Although we are not interested in the marginal distribution of x, we still need to specify the model for the marginal distribution of x. Jae-Kwang Kim (ISU) Part 1 17 / 181

18 3 Mean Score Approach The observed likelihood is the marginal density of (y obs, δ). The observed likelihood is L obs (η) = f (y; θ)p(δ y; φ)dµ(y) = R(y obs,δ) f (y; θ)p(δ y; φ)dµ(y mis ) where y mis is the missing part of y and η = (θ, φ). Observed score equation: S obs (η) η log L obs(η) = 0 Computing the observed score function can be computationally challenging because the observed likelihood is an integral form. Jae-Kwang Kim (ISU) Part 1 18 / 181

19 3 Mean Score Approach Theorem 2: Mean Score Theorem (Fisher, 1922) [Theorem 2.5 of KS] Under some regularity conditions, the observed score function equals to the mean score function. That is, S obs (η) = S(η) where S(η) = E{S com(η) y obs, δ} S com(η) = log f (y, δ; η), η f (y, δ; η) = f (y; θ)p(δ y; φ). The mean score function is computed by taking the conditional expectation of the complete-sample score function given the observation. The mean score function is easier to compute than the observed score function. Jae-Kwang Kim (ISU) Part 1 19 / 181

20 3 Mean Score Approach Proof of Theorem 2 Since L obs (η) = f (y, δ; η) /f (y, δ y obs, δ; η), we have η ln L obs (η) = ln f (y, δ; η) η η ln f (y, δ y obs, δ; η), taking a conditional expectation of the above equation over the conditional distribution of (y, δ) given (y obs, δ), we have { } η ln L obs (η) = E η ln L obs (η) y obs, δ { } = E {S com (η) y obs, δ} E η ln f (y, δ y obs, δ; η) y obs, δ. Here, the first equality holds because L obs (η) is a function of (y obs, δ) only. The last term is equal to zero by Lemma 1, which states that the expected value of the score function is zero and the reference distribution in this case is the conditional distribution of (y, δ) given (y obs, δ). Jae-Kwang Kim (ISU) Part 1 20 / 181

21 3 Mean Score Approach Example 3 [Example 2.6 of KS] 1 Suppose that the study variable y follows from a normal distribution with mean x β and variance σ 2. The score equations for β and σ 2 under complete response are S 1(β, σ 2 ) = n ( yi x i β ) x i /σ 2 = 0 and S 2(β, σ 2 ) = n/(2σ 2 ) + n ( yi x i β )2 /(2σ 4 ) = 0. 2 Assume that y i are observed only for the first r elements and the MAR assumption holds. In this case, the mean score function reduces to and S 1(β, σ 2 ) = S 2(β, σ 2 ) = n/(2σ 2 ) + r ( yi x i β ) x i /σ 2 r ( yi x i β )2 /(2σ 4 ) + (n r)/(2σ 2 ). Jae-Kwang Kim (ISU) Part 1 21 / 181

22 3 Mean Score Approach Example 3 (Cont d) 3 The maximum likelihood estimator obtained by solving the mean score equations is ( r ) 1 r ˆβ = x i x i x i y i and ˆσ 2 = 1 r r ( y i x i ˆβ) 2. Thus, the resulting estimators can be also obtained by simply ignoring the missing part of the sample, which is consistent with the result in Example 2 (for φ 2 = 0). Jae-Kwang Kim (ISU) Part 1 22 / 181

23 3 Mean Score Approach Discussion of Example 3 We are interested in estimating θ for the conditional density f (y x; θ). Under MAR, the observed likelihood for θ is r n L obs (θ) = f (y i x i ; θ) f (y x i ; θ)dµ(y) = i=r+1 r f (y i x i ; θ). The same conclusion can follow from the mean score theorem. Under MAR, the mean score function is r n S(θ) = S(θ; x i, y i ) + E{S(θ; x i, Y ) x i } = r S(θ; x i, y i ) i=r+1 where S(θ; x, y) is the score function for θ and the second equality follows from Lemma 1. Jae-Kwang Kim (ISU) Part 1 23 / 181

24 3 Mean Score Approach Example 4 [Example 2.5 of KS] 1 Suppose that the study variable y is randomly distributed with Bernoulli distribution with probability of success p i, where p i = p i (β) = exp (x i β) 1 + exp (x i β) for some unknown parameter β and x i is a vector of the covariates in the logistic regression model for y i. We assume that 1 is in the column space of x i. 2 Under complete response, the score function for β is S 1(β) = n (y i p i (β)) x i. Jae-Kwang Kim (ISU) Part 1 24 / 181

25 3 Mean Score Approach Example 4 (Cont d) 3 Let δ i be the response indicator function for y i with distribution Bernoulli(π i ) where π i = exp (x i φ 0 + y i φ 1) 1 + exp (x i φ0 + y iφ 1). We assume that x i is always observed, but y i is missing if δ i = 0. 4 Under missing data, the mean score function for β is S 1 (β, φ) = δi =1 {y i p i (β)} x i + δ i =0 1 w i (y; β, φ) {y p i (β)} x i, where w i (y; β, φ) is the conditional probability of y i = y given x i and δ i = 0: y=0 w i (y; β, φ) = P β (y i = y x i ) P φ (δ i = 0 y i = y, x i ) 1 z=0 P β (y i = z x i ) P φ (δ i = 0 y i = z, x i ) Thus, S 1 (β, φ) is also a function of φ. Jae-Kwang Kim (ISU) Part 1 25 / 181

26 3 Mean Score Approach Example 4 (Cont d) 5 If the response mechanism is MAR so that φ 1 = 0, then w i (y; β, φ) = P β (y i = y x i ) 1 z=0 P β (y i = z x i ) = P β (y i = y x i ) and so S 1 (β, φ) = δi =1 {y i p i (β)} x i = S 1 (β). 6 If MAR does not hold, then ( ˆβ, ˆφ) can be obtained by solving S 1 (β, φ) = 0 and S 2(β, φ) = 0 jointly, where S 2 (β, φ) = δi =1 {δ i π (φ; x i, y i )} (x i, y i ) + δ i =0 1 w i (y; β, φ) {δ i π i (φ; x i, y)} (x i, y). y=0 Jae-Kwang Kim (ISU) Part 1 26 / 181

27 3 Mean Score Approach Discussion of Example 4 We may not have a unique solution to S(η) = 0, where S(η) = [ S1 (β, φ), S 2(β, φ) ] when MAR does not hold, because of the non-identifiability problem associated with non-ignorable missing. To avoid this problem, often a reduced model is used for the response model. Pr (δ = 1 x, y) = Pr (δ = 1 u, y) where x = (u, z). The reduced response model introduces a smaller set of parameters and the over-identified situation can be resolved. (More discussion will be made in Part 4 lecture.) Computing the solution to S(η) is also difficult. EM algorithm, which will be presented soon, is a useful computational tool. Jae-Kwang Kim (ISU) Part 1 27 / 181

28 4 Observed information Definition 1 Observed score function: S obs (η) = η log L obs(η) 2 Fisher information from observed likelihood: I obs (η) = 2 log L η η T obs (η) 3 Expected (Fisher) information from observed likelihood: I obs (η) = E η {I obs (η)}. Lemma 2 [Theorem 2.6 of KS] Under regularity conditions, E{S obs (η)} = 0, and V{S obs (η)} = I obs (η), where I obs (η) = E η{i obs (η)} is the expected information from the observed likelihood. Jae-Kwang Kim (ISU) Part 1 28 / 181

29 4 Observed information Under missing data, the MLE ˆη is the solution to S(η) = 0. Under some regularity conditions, ˆη converges in probability to η 0 and has the asymptotic variance {I obs (η 0)} 1 with { I obs (η) = E } { } } η S obs(η) = E S 2 T obs (η) = E { S 2 (η) & B 2 = BB T. For variance estimation of ˆη, may use {I obs (ˆη)} 1. IID setup: The empirical (Fisher) information for the variance of ˆη is [Ĥ(ˆη) ] 1 = { n S 2 i (ˆη) } 1 where S i (η) = E{S i (η) y i,obs, δ i } (Redner and Walker, 1984). In general, I obs (ˆη) is preferred to I obs (ˆη) for variance estimation of ˆη. Jae-Kwang Kim (ISU) Part 1 29 / 181

30 4 Observed information Return to Example 1 Observed score function MLE for θ: S obs (θ) = n δ i log(θ) θ ˆθ = n y i n δ i n y i Fisher information: I obs (θ) = n δ i/θ 2 Expected information: I obs (θ) = n (1 e θc )/θ 2 = n(1 e θc )/θ 2. Which one do you prefer? Jae-Kwang Kim (ISU) Part 1 30 / 181

31 4 Observed information Motivation L com(η) = f (y, δ; η): complete-sample likelihood with no missing data Fisher information associated with L com(η): I com(η) = η Scom(η) = 2 log Lcom(η) T η ηt L obs (η): the observed likelihood Fisher information associated with L obs (η): I obs (η) = η S obs(η) = 2 T η η log L obs(η) T How to express I obs (η) in terms of I com(η) and S com(η)? Jae-Kwang Kim (ISU) Part 1 31 / 181

32 4 Observed information Theorem 3 (Louis, 1982; Oakes, 1999) [Theorem 2.7 of KS] Under regularity conditions allowing the exchange of the order of integration and differentiation, [ I obs (η) = E{I com(η) y obs, δ} E{Scom(η) y 2 obs, δ} S(η) 2] where S(η) = E{S com(η) y obs, δ}. = E{I com(η) y obs, δ} V {S com(η) y obs, δ}, Jae-Kwang Kim (ISU) Part 1 32 / 181

33 Proof of Theorem 3 By Theorem 2, the observed information associated with L obs (η) can be expressed as I obs (η) = η S (η) where S (η) = E {S com(η) y obs, δ; η}. Thus, we have η S (η) = S com(η; y)f (y, δ y η obs, δ; η) dµ(y) { } = Scom(η; y) f (y, δ y η obs, δ; η) dµ(y) { } + S com(η; y) η f (y, δ y obs, δ; η) dµ(y) = E { S com(η)/ η y obs, δ } { } + S com(η; y) η log f (y, δ y obs, δ; η) f (y, δ y obs, δ; η) dµ(y). The first term is equal to E {I com(η) y obs, δ} and the second term is equal to E { S com(η)s mis(η) y obs, δ } = E [{ S(η) } + Smis(η) S mis(η) y obs, δ ] = E { S mis(η)s mis(η) y obs, δ } because E { S(η)Smis(η) y obs, δ } = 0. Jae-Kwang Kim (ISU) Part 1 33 / 181

34 4. Observed information S mis (η): the score function with the conditional density f (y, δ y obs, δ). { } Expected missing information: I mis (η) = E S η T mis (η) I mis (η) = E { S mis (η) 2}. Missing information principle (Orchard and Woodbury, 1972): I mis (η) = I com(η) I obs (η), satisfying where I com(η) = E { S com(η)/ η T } is the expected information with complete-sample likelihood. An alternative expression of the missing information principle is V{S mis (η)} = V{S com(η)} V{ S(η)}. Note that V{S com(η)} = I com(η) and V{S obs (η)} = I obs (η). Jae-Kwang Kim (ISU) Part 1 34 / 181

35 4. Observed information Example 5 1 Consider the following bivariate normal distribution: ( ) [( ) ( y1i µ1 σ11 σ 12 N, µ 2 σ 12 σ 22 y 2i )], for i = 1, 2,, n. Assume for simplicity that σ 11, σ 12 and σ 22 are known constants and µ = (µ 1, µ 2) be the parameter of interest. 2 The complete sample score function for µ is n n ( ) 1 ( S com (µ) = S com (i) σ11 σ 12 y1i µ 1 (µ) = σ 12 σ 22 y 2i µ 2 The information matrix of µ based on the complete sample is ( ) 1 σ11 σ 12 I com (µ) = n. σ 12 σ 22 ). Jae-Kwang Kim (ISU) Part 1 35 / 181

36 4. Observed information Example 5 (Cont d) 3 Suppose that there are some missing values in y 1i and y 2i and the original sample is partitioned into four sets: H = both y 1 and y 2 respond K = only y 1 is observed L = only y 2 is observed M = both y 1 and y 2 are missing. Let n H, n K, n L, n M represent the size of H, K, L, M, respectively. 4 Assume that the response mechanism does not depend on the value of (y 1, y 2) and so it is MAR. In this case, the observed score function of µ based on a single observation in set K is E { } S com (i) (µ) y 1i, i K ( ) 1 ( ) σ11 σ 12 y1i µ 1 = σ 12 σ 22 E(y 2i y 1i ) µ 2 ( σ 1 11 = (y ) 1i µ 1). 0 Jae-Kwang Kim (ISU) Part 1 36 / 181

37 4. Observed information Example 5 (Cont d) 5 Similarly, we have E { } S com (i) (µ) y 2i, i L = ( 0 σ 1 22 (y 2i µ 2) ). 6 Therefore, and the observed information matrix of µ is I obs (µ) = n H ( σ11 σ 12 σ 12 σ 22 ) 1 + n K ( σ ) + n L ( σ 1 22 and the asymptotic variance of the MLE of µ can be obtained by the inverse of I obs (µ). ) Jae-Kwang Kim (ISU) Part 1 37 / 181

38 5. EM algorithm Interested in finding ˆη that maximizes L obs (η). The MLE can be obtained by solving S obs (η) = 0, which is equivalent to solving S(η) = 0 by Theorem 2. Computing the solution S(η) = 0 can be challenging because it often involves computing I obs (η) = S(η)/ η in order to apply Newton method: { 1 ˆη (t+1) = ˆη (t) + I obs (ˆη )} (t) S(ˆη (t) ). We may rely on Louis formula (Theorem 3) to compute I obs (η). EM algorithm provides an alternative method of solving S(η) = 0 by writing S(η) = E {S com(η) y obs, δ; η} and using the following iterative method: { ˆη (t+1) solve E S com(η) y obs, δ; ˆη (t)} = 0. Jae-Kwang Kim (ISU) Part 1 38 / 181

39 5. EM algorithm Definition Let η (t) be the current value of the parameter estimate of η. The EM algorithm can be defined as iteratively carrying out the following E-step and M-steps: E-step: Compute ( Q η η (t)) = E {ln f (y, δ; η) y obs, δ, η (t)} M-step: Find η (t+1) that maximizes Q(η η (t) ) w.r.t. η. Theorem 4 (Dempster et al., 1977) [Theorem 3.2] Let L obs (η) = f (y, δ; η) dµ(y) be the observed likelihood of η. If R(y obs,δ) Q(η (t+1) η (t) ) Q(η (t) η (t) ), then L obs (η (t+1) ) L obs (η (t) ). Jae-Kwang Kim (ISU) Part 1 39 / 181

40 5. EM algorithm Remark 1 Convergence of EM algorithm is linear. It can be shown that η (t+1) η (t) = Jmis ( η (t) η (t 1)) where J mis = I 1 comi mis is called the fraction of missing information. 2 Under MAR and for the exponential family of the distribution of the form f (y; θ) = b (y) exp { θ T (y) A (θ) }. The M-step computes θ (t+1) by the solution to { E T (y) y obs, θ (t)} = E {T (y) θ}. Jae-Kwang Kim (ISU) Part 1 40 / 181

41 5. EM algorithm h 1(θ) h 2(θ) θ Figure : Illustration of EM algorithm for exponential family (h 1 (θ) = E {T (y) y obs, θ}, h 2 (θ) = E {T (y) θ}) Jae-Kwang Kim (ISU) Part 1 41 / 181

42 5. EM algorithm Return to Example 4 E-step: where ( S 1 β β (t), φ (t)) = {y i p i (β)} x i + δ i =1 δ i =0 w ij(t) = Pr(Y i = j x i, δ i = 0; β (t), φ (t) ) = 1 w ij(t) {j p i (β)} x i, j=0 Pr(Y i = j x i ; β (t) )Pr(δ i = 0 x i, j; φ (t) ) 1 y=0 Pr(Y i = y x i ; β (t) )Pr(δ i = 0 x i, y; φ (t) ) and ( S 2 φ β (t), φ (t)) = {δ i π (x i, y i ; φ)} ( x ) i, y i δ i =1 + δ i =0 1 w ij(t) {δ i π i (x i, j; φ)} ( x i, j ). j=0 Jae-Kwang Kim (ISU) Part 1 42 / 181

43 5. EM algorithm Return to Example 4 (Cont d) M-step: The parameter estimates are updated by solving [ S 1 (β β (t), φ (t)), S 2 ( φ β (t), φ (t))] = (0, 0) for β and φ. For categorical missing data, the conditional expectation in the E-step can be computed using the weighted mean with weights w ij(t). Ibrahim (1990) called this method EM by weighting. Observed information matrix can also be obtained by the Louis formula (in Theorem 3) using the weighted mean in the E-step. Jae-Kwang Kim (ISU) Part 1 43 / 181

44 5. EM algorithm Example 6. Mixture model [Example 3.8 of KS] Observation where Y i = (1 W i ) Z 1i + W i Z 2i, ( ) Z 1i N µ 1, σ1 2 ( ) Z 2i N µ 2, σ2 2 W i Bernoulli (π). i = 1, 2,, n Parameter of interest θ = ( ) µ 1, µ 2, σ1, 2 σ2, 2 π Jae-Kwang Kim (ISU) Part 1 44 / 181

45 5. EM algorithm Example 6 (Cont d) Observed likelihood where and L obs (θ) = n pdf (y i θ) ( ) ( ) pdf (y θ) = (1 π) φ y µ 1, σ1 2 + πφ y µ 2, σ2 2 ( φ y µ, σ 2) = 1 exp 2πσ [ ] (y µ)2. 2σ 2 Jae-Kwang Kim (ISU) Part 1 45 / 181

46 5. EM algorithm Example 6 (Cont d) Full sample likelihood where pdf (y, w θ) = L full (θ) = n pdf (y i, w i θ) [ ( )] 1 w [ ( w φ y µ 1, σ1 2 φ y µ 2, σ2)] 2 π w (1 π) 1 w. Thus, ln L full (θ) = n + [ ( ) ( )] (1 w i ) ln φ y i µ 1, σ1 2 + w i ln φ y i µ 2, σ2 2 n {w i ln (π) + (1 w i ) ln (1 π)} Jae-Kwang Kim (ISU) Part 1 46 / 181

47 5. EM algorithm Example 6 (Cont d) E-M algorithm [E-step] ( Q θ θ (t)) = where r (t) i n [( 1 r (t) i + n ) = E (w i y i, θ (t) with E (w i y i, θ) = ) ( ) ln φ y i µ 1, σ1 2 { ( r (t) i ln (π) + 1 r (t) i ) ( )] + r (t) i ln φ y i µ 2, σ2 2 } ln (1 π) πφ ( ) y i µ 2, σ2 2 (1 π) φ (y i µ 1, σ1 2) + πφ (y i µ 2, σ2 2) [M-step] ( θ Q θ θ (t)) = 0. Jae-Kwang Kim (ISU) Part 1 47 / 181

48 5. EM algorithm Example 7. (Robust regression) [Example 3.12 of KS] Model: y i = β 0 + β 1x i + σe i with e i t(ν), ν: known. Missing data setup: e i = u i / w i where u i N(0, 1), w i χ 2 /ν. (x i, y i, w i ): complete data (x i, y i always observed, w i always missing) ) y i (x i, w i ) N (β 0 + β 1x i, σ 2 /w i and x i is fixed (thus, independent of w i ). EM algorithm can be used to estimate θ = (β 0, β 1, σ 2 ) Jae-Kwang Kim (ISU) Part 1 48 / 181

49 5. EM algorithm Example 7 (Cont d) E-step: Find the conditional distribution of w i given x i. By Bayes theorem, f (w i x i, y i ) f (w i )f (y i w i, x i ) { Gamma ν + 1 ( ) } 2, 2 yi β 0 β 1 x 2 1 i ν + σ Thus, E(w i x i, y i, θ (t) ) = ν + 1 ( ) 2, ν + d (t) i where d (t) i = (y i β (t) 0 β (t) 1 x i)/σ (t). Jae-Kwang Kim (ISU) July 5th, / 181

50 5. EM algorithm Example 7 (Cont d) M-step: ( µ (t) x ), µ (t) y = n w (t) i (x i, y i ) /( n w (t) i ) β (t+1) 0 = µ (t) y β (t+1) 1 µ (t) x n β (t+1) 1 = w (t) i (x i µ (t) x )(y i µ (t) n w (t) i (x i µ (t) x ) 2 σ 2(t+1) = 1 n ( w (t) i y i β (t) 0 β (t) 1 n x i where w (t) i = E(w i x i, y i, θ (t) ). y ) ) 2 Jae-Kwang Kim (ISU) July 5th, / 181

51 6. Summary Interested in finding the MLE that maximizes the observed likelihood function. Under MAR, the model specification of the response mechanism is not necessary. Mean score equation can be used to compute the MLE. EM algorithm is a useful computational tool for solving the mean score equation. The E-step of the EM algorithm may require some computational tool (see Part 2). The asymptotic variance of the MLE can be computed by the inverse of the observed information matrix, which can be computed using Louis formula. Jae-Kwang Kim (ISU) July 5th, / 181

52 REFERENCES Cheng, P. E. (1994), Nonparametric estimation of mean functionals with data missing at random, Journal of the American Statistical Association 89, Dempster, A. P., N. M. Laird and D. B. Rubin (1977), Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B 39, Fisher, R. A. (1922), On the mathematical foundations of theoretical statistics, Philosophical Transactions of the Royal Society of London A 222, Fuller, W. A., M. M. Loughin and H. D. Baker (1994), Regression weighting in the presence of nonresponse with application to the Nationwide Food Consumption Survey, Survey Methodology 20, Hirano, K., G. Imbens and G. Ridder (2003), Efficient estimation of average treatment effects using the estimated propensity score, Econometrica 71, Ibrahim, J. G. (1990), Incomplete data in generalized linear models, Journal of the American Statistical Association 85, Jae-Kwang Kim (ISU) July 5th, / 181

53 Kim, J. K. (2011), Parametric fractional imputation for missing data analysis, Biometrika 98, Kim, J. K. and C. L. Yu (2011), A semi-parametric estimation of mean functionals with non-ignorable missing data, Journal of the American Statistical Association 106, Kim, J. K., M. J. Brick, W. A. Fuller and G. Kalton (2006), On the bias of the multiple imputation variance estimator in survey sampling, Journal of the Royal Statistical Society: Series B 68, Kim, J. K. and M. K. Riddles (2012), Some theory for propensity-score-adjustment estimators in survey sampling, Survey Methodology 38, Kott, P. S. and T. Chang (2010), Using calibration weighting to adjust for nonignorable unit nonresponse, Journal of the American Statistical Association 105, Louis, T. A. (1982), Finding the observed information matrix when using the EM algorithm, Journal of the Royal Statistical Society: Series B 44, Meng, X. L. (1994), Multiple-imputation inferences with uncongenial sources of input (with discussion), Statistical Science 9, Jae-Kwang Kim (ISU) July 5th, / 181

54 Oakes, D. (1999), Direct calculation of the information matrix via the em algorithm, Journal of the Royal Statistical Society: Series B 61, Orchard, T. and M.A. Woodbury (1972), A missing information principle: theory and applications, in Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, University of California Press, Berkeley, California, pp Redner, R. A. and H. F. Walker (1984), Mixture densities, maximum likelihood and the EM algorithm, SIAM Review 26, Robins, J. M., A. Rotnitzky and L. P. Zhao (1994), Estimation of regression coefficients when some regressors are not always observed, Journal of the American Statistical Association 89, Robins, J. M. and N. Wang (2000), Inference for imputation estimators, Biometrika 87, Rubin, D. B. (1976), Inference and missing data, Biometrika 63, Tanner, M. A. and W. H. Wong (1987), The calculation of posterior distribution by data augmentation, Journal of the American Statistical Association 82, Jae-Kwang Kim (ISU) July 5th, / 181

55 Wang, N. and J. M. Robins (1998), Large-sample theory for parametric multiple imputation procedures, Biometrika 85, Wang, S., J. Shao and J. K. Kim (2014), Identifiability and estimation in problems with nonignorable nonresponse, Statistica Sinica 24, Wei, G. C. and M. A. Tanner (1990), A Monte Carlo implementation of the EM algorithm and the poor man s data augmentation algorithms, Journal of the American Statistical Association 85, Zhou, M. and J. K. Kim (2012), An efficient method of estimation for longitudinal surveys with monotone missing data, Biometrika 99,

56 Statistical Methods for Handling Missing Data Part 2: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Jae-Kwang Kim (ISU) Part 2 52 / 181

57 Introduction Basic setup Y: a vector of random variables with distribution F (y; θ). y 1,, y n are n independent realizations of Y. We are interested in estimating ψ which is implicitly defined by E{U(ψ; Y)} = 0. Under complete observation, a consistent estimator ˆψ n of ψ can be obtained by solving estimating equation for ψ: n U(ψ; y i ) = 0. A special case of estimating function is the score function. In this case, ψ = θ. Sandwich variance estimator is often used to estimate the variance of ˆψ n: where τ u = E{ U(ψ; y)/ ψ }. ˆV ( ˆψ n) = ˆτ u 1 ˆV (U)ˆτ u 1 Jae-Kwang Kim (ISU) Part 2 53 / 181

58 1. Introduction Missing data setup Suppose that y i is not fully observed. y i = (y obs,i, y mis,i ): (observed, missing) part of y i δ i : response indicator functions for y i. Under the existence of missing data, we can use the following estimators: ˆψ: solution to n E {U (ψ; y i ) y obs,i, δ i } = 0. (1) The equation in (1) is often called expected estimating equation. Jae-Kwang Kim (ISU) Part 2 54 / 181

59 1. Introduction Motivation (for imputation) Computing the conditional expectation in (1) can be a challenging problem. 1 The conditional expectation depends on unknown parameter values. That is, E {U (ψ; y i ) y obs,i, δ i } = E {U (ψ; y i ) y obs,i, δ i ; θ, φ}, where θ is the parameter in f (y; θ) and φ is the parameter in p(δ y; φ). 2 Even if we know η = (θ, φ), computing the conditional expectation is numerically difficult. Jae-Kwang Kim (ISU) Part 2 55 / 181

60 1. Introduction Imputation Imputation: Monte Carlo approximation of the conditional expectation (given the observed data). E {U (ψ; y i ) y obs,i, δ i } = 1 m m j=1 ( ) U ψ; y obs,i, y (j) mis,i 1 Bayesian approach: generate y mis,i from f (y mis,i y obs, δ) = f (y mis,i y obs, δ; η) p(η y obs, δ)dη 2 Frequentist approach: generate y mis,i from f (y mis,i y obs,i, δ; ˆη), where ˆη is a consistent estimator. Jae-Kwang Kim (ISU) Part 2 56 / 181

61 1. Introduction Imputation Questions 1 How to generate the Monte Carlo samples (or the imputed values)? 2 What is the asymptotic distribution of ˆψ I which solves 1 m m j=1 ( ) U ψ; y obs,i, y (j) mis,i = 0, where y (j) mis,i f (y mis,i y obs,i, δ; ˆη p) for some ˆη p? 3 How to estimate the variance of ˆψ I? Jae-Kwang Kim (ISU) Part 2 57 / 181

62 2. Basic Theory for Imputation Basic Setup (for Case 1: ψ = η) y = (y 1,, y n) f (y; θ) δ = (δ 1,, δ n) P(δ y; φ) y = (y obs, y mis ): (observed, missing) part of y. y (1) mis,, y (m) mis : m imputed values of y mis generated from f (y mis y obs, δ; ˆη p) = f (y; ˆθ p)p(δ y; ˆφ p) f (y; ˆθ p)p(δ y; ˆφ p)dµ(y mis), where ˆη p = (ˆθ p, ˆφ p) is a preliminary estimator of η = (θ, φ). Using m imputed values, imputed score function is computed as S imp,m (η ˆη p) 1 m m j=1 S com ( η; y obs, y (j) mis, δ ) where S com(η; y) is the score function of η = (θ, φ) under complete response. Jae-Kwang Kim (ISU) Part 2 58 / 181

63 2. Basic Theory for Imputation Lemma 1 (Asymptotic results for m = ) Assume that ˆη p converges in probability to η. Let ˆη I,m be the solution to 1 m m j=1 S com ( η; y obs, y (j) mis, δ ) = 0, where y (1) mis,, y (m) mis are the imputed values generated from f (y mis y obs, δ; ˆη p). Then, under some regularity conditions, for m, and ˆη imp, = ˆη MLE + J mis (ˆη p ˆη MLE) (2) V (ˆη imp, ). = I 1 obs + J mis {V (ˆη p) V (ˆη MLE)} J mis, where J mis = I 1 comi mis is the fraction of missing information. Jae-Kwang Kim (ISU) Part 2 59 / 181

64 2. Basic Theory for Imputation Remark For m =, the imputed score equation becomes the mean score equation. Equation (2) means that ˆη imp, = (I J mis) ˆη MLE + J mis ˆη p. (3) That is, ˆη imp, is a convex combination of ˆη MLE and ˆη p. Note that ˆη imp, is one-step EM update with initial estimate ˆη p. Let ˆη (t) be the t-th EM update of η that is computed by solving ( S η ˆη (t 1)) = 0 with ˆη (0) = ˆη p. Equation (3) implies that ˆη (t) = (I J mis) ˆη MLE + J mis ˆη (t 1). Thus, we can obtain ˆη (t) = ˆη MLE + (J mis) t 1 (ˆη (0) ˆη MLE ), which justifies lim t ˆη (t) = ˆη MLE. Jae-Kwang Kim (ISU) Part 2 60 / 181

65 2. Basic Theory for Imputation Theorem 1 (Asymptotic results for m < ) [Theorem 4.1 of KS] Let ˆη p be a preliminary n-consistent estimator of η with variance V p. Under some regularity conditions, the solution ˆη imp,m to S m (η ˆη p) 1 m m j=1 S com ( η; y obs, y (j) mis, δ ) = 0 has mean η 0 and asymptotic variance equal to V (ˆη imp,m ) { }. = I 1 obs + J mis V p I 1 obs J mis + m 1 IcomI 1 misicom 1 (4) where J mis = I 1 comi mis. This theorem was originally presented by Wang and Robins (1998). Jae-Kwang Kim (ISU) Part 2 61 / 181

66 2. Basic Theory for Imputation Remark If we use ˆη p = ˆη MLE, then the asymptotic variance in (4) is V (ˆη imp,m ). = I 1 obs + m 1 IcomI 1 misicom. 1 In Bayesian imputation (or multiple imputation), the posterior values of η are independently generated from η N(ˆη MLE, I 1 obs ), which implies that we can use V p = I 1 obs + m 1 I 1 obs. Thus, the asymptotic variance in (4) for multiple imputation is V (ˆη imp,m ). = I 1 obs + m 1 JmisI 1 obs J 1 mis + m 1 IcomI 1 misicom. 1 The second term is the additional price we pay when generating the posterior values, rather than using ˆη MLE directly. Jae-Kwang Kim (ISU) Part 2 62 / 181

67 2. Basic Theory for Imputation Basic Setup (for Case 2: ψ η) Parameter ψ defined by E{U(ψ; y)} = 0. Under complete response, a consistent estimator of ψ can be obtained by solving U (ψ; y) = 0. Assume that some part of y, denoted by y mis, is not observed and m imputed values, say y (1) mis,, y (m) mis, are generated from f (ymis y obs, δ; ˆη MLE ), where ˆη MLE is the MLE of η 0. The imputed estimating function using m imputed values is computed as Ū imp,m (ψ ˆη MLE ) = 1 m m U(ψ; y (j) ), (5) where y (j) = (y obs, y (j) mis ). Let ˆψ imp,m be the solution to Ū imp,m (ψ ˆη MLE ) = 0. We are interested in the asymptotic properties of ˆψ imp,m. j=1 Jae-Kwang Kim (ISU) Part 2 63 / 181

68 2. Basic Theory for Imputation Theorem 2 [Theorem 4.2 of KS] Suppose that the parameter of interest ψ 0 is estimated by solving U (ψ) = 0 under complete response. Then, under some regularity conditions, the solution to E {U (ψ) y obs, δ; ˆη MLE } = 0 (6) has mean ψ 0 and the asymptotic variance τ 1 Ωτ 1, where τ = E { U (ψ 0) / ψ } Ω = V { Ū (ψ 0 η 0) + κs obs (η 0) } and κ = E {U (ψ 0) S mis(η 0)} I 1 obs. This theorem was originally presented by Robins and Wang (2000). Jae-Kwang Kim (ISU) Part 2 64 / 181

69 2. Basic Theory for Imputation Remark Writing Ū(ψ η) E{U(ψ) y obs, δ; η}, the solution to (6) can be treated as the solution to the joint estimating equation [ ] U1(ψ, η) U (ψ, η) = 0, U 2(η) where U 1(ψ, η) = Ū (ψ η) and U2(η) = S obs (η). We can apply the Taylor expansion to get ( ) ( ) ( ) 1 [ ˆψ ψ0 B11 B 12 U1(ψ 0, η 0) = ˆη η 0 B 21 B 22 U 2(η 0) ] where ( B11 B 12 Thus, as B 21 = 0, B 21 B 22 ) = [ E ( U1/ ψ ) E ( U 1/ η ) E ( U 2/ ψ ) E ( U 2/ η ) ˆψ { } = ψ 0 B 1 11 U 1(ψ 0, η 0) B 12B 1 22 U 2(η 0). In Theorem 2, B 11 = τ, B 12 = E(US mis ), and B 22 = I obs. Jae-Kwang Kim (ISU) Part 2 65 / 181 ].

70 3. Monte Carlo EM Motivation: Monte Carlo samples in the EM algorithm can be used as imputed values. Monte Carlo EM 1 In the EM algorithm defined by [E-step] Compute ( Q η η (t)) = E {ln f (y, δ; η) y obs, δ; η (t)} [M-step] Find η (t+1) that maximizes Q ( η η (t)), E-step is computationally cumbersome because it involves integral. 2 Wei and Tanner (1990): In the E-step, first draw y (1) mis,, y (m) mis f (y mis y obs, δ; η (t)) and approximate ( Q η η (t)) 1 = m m ln f j=1 ( y obs, y (j) mis, δ; η ). Jae-Kwang Kim (ISU) Part 2 66 / 181

71 3. Monte Carlo EM Example 1 [Example 3.15 of KS] Suppose that y i f (y i x i ; θ) Assume that x i is always observed but we observe y i only when δ i = 1 where δ i Bernoulli [π i (φ)] and π i (φ) = exp (φ0 + φ1x i + φ 2y i ) 1 + exp (φ 0 + φ 1x i + φ 2y i ). To implement the MCEM method, in the E-step, we need to generate samples from f (y i x i, δ i = 0; ˆθ, ˆφ) = f (y i x i ; ˆθ){1 π i ( ˆφ)} f (yi x i ; ˆθ){1 π i ( ˆφ)}dy i. Jae-Kwang Kim (ISU) Part 2 67 / 181

72 3. Monte Carlo EM Example 1 (Cont d) We can use the following rejection method to generate samples from f (y i x i, δ i = 0; ˆθ, ˆφ): 1 Generate yi from f (y i x i ; ˆθ). 2 Using yi, compute π i ( ˆφ) = Accept yi with probability 1 πi ( ˆφ). 3 If yi is not accepted, then goto Step 1. exp( ˆφ 0 + ˆφ 1 x i + ˆφ 2 y i ) 1 + exp( ˆφ 0 + ˆφ 1 x i + ˆφ 2 y i ). Jae-Kwang Kim (ISU) Part 2 68 / 181

73 3. Monte Carlo EM Example 1 (Cont d) Using the m imputed values of y i, denoted by y (1) i,, y (m) i, and the M-step can be implemented by solving and n j=1 n m ( ) S θ; x i, y (j) i = 0 j=1 where S (θ; x i, y i ) = log f (y i x i ; θ)/ θ. m { } ( ) δ i π(φ; x i, y (j) i ) 1, x i, y (j) i = 0, Jae-Kwang Kim (ISU) Part 2 69 / 181

74 3. Monte Carlo EM Example 2 (GLMM) [Example 3.18] Basic Setup: Let y ij be a binary random variable (that takes 0 or 1) with probability p ij = Pr (y ij = 1 x ij, a i ) and we assume that logit (p ij ) = x ijβ + a i where x ij is a p-dimensional covariate associate with j-th repetition of unit i, β is the parameter of interest that can represent the treatment effect due to x, and a i represents the random effect associate with unit i. We assume that a i are iid with N ( 0, σ 2). Missing data : a i Observed likelihood: L obs ( β, σ 2) = i { j p (x ij, a i ; β) y ij [1 p (x ij, a i ; β)] 1 y 1 ( ij σ φ ai ) } da i σ where φ ( ) is the pdf of the standard normal distribution. Jae-Kwang Kim (ISU) Part 2 70 / 181

75 3. Monte Carlo EM Example 2 (Cont d) MCEM approach: generate a i from Metropolis-Hastings algorithm: 1 Generate ai from f 2 (a i ; ˆσ). 2 Set { where ( ρ f (a i x i, y i ; ˆβ, ˆσ) f 1(y i x i, a i ; ˆβ)f 2(a i ; ˆσ). a (t) i = a (t 1) i ai a (t 1) i w.p. ρ(a (t 1) i, ai ) w.p. 1 ρ(a (t 1) i, ai ) ) f 1 (y i x i, a, ai i ; ˆβ ) = min ( ), 1 f 1 y i x i, a (t 1) ; ˆβ. i Jae-Kwang Kim (ISU) Part 2 71 / 181

76 3. Monte Carlo EM Remark Monte Carlo EM can be used as a frequentist approach to imputation. Convergence is not guaranteed (for fixed m). E-step can be computationally heavy. (May use MCMC method). Jae-Kwang Kim (ISU) Part 2 72 / 181

77 4. Parametric Fractional Imputation Parametric fractional imputation 1 More than one (say m) imputed values of y mis,i : y (1) mis,i,, y (m) mis,i from some (initial) density h (y mis,i ). 2 Create weighted data set {( w ij, yij ) } ; j = 1, 2,, m; i = 1, 2, n where m j=1 w ij = 1, yij = (y obs,i, y (j) mis,i ) w ij f (y ij, δ i ; ˆη)/h(y (j) mis,i ), ˆη is the maximum likelihood estimator of η, and f (y, δ; η) is the joint density of (y, δ). 3 The weight wij weights. are the normalized importance weights and can be called fractional If y mis,i is categorical, then simply use all possible values of y mis,i as the imputed values and then assign their conditional probabilities as the fractional weights. Jae-Kwang Kim (ISU) Part 2 73 / 181

78 4. Parametric Fractional Imputation Remark Importance sampling idea: For sufficiently large m, m j=1 ij g ( yij ) g(yi ) f (y i,δ i ; ˆη) = f (yi,δ i ; ˆη) h(y h(y mis,i ) mis,i)dy mis,i w h(y mis,i ) h(y mis,i)dy mis,i = E {g (y i ) y obs,i, δ i ; ˆη} for any g such that the expectation exists. In the importance sampling literature, h( ) is called proposal distribution and f ( ) is called target distribution. Do not need to compute the conditional distribution f (y mis,i y obs,i, δ i ; η). Only the joint distribution f (y obs,i, y mis,i, δ i ; η) is needed because f (y obs,i, y (j) mis,i, δ i; ˆη)/h(y (j) i,mis ) m k=1 f (y obs,i, y (k) mis,i, δ i; ˆη)/h(y (k) i,mis ) = f (y m k=1 (j) mis,i y obs,i, δ i ; ˆη)/h(y (j) i,mis ) (k) f (ymis,i y obs,i, δ i ; ˆη)/h(y (k) i,mis ). Jae-Kwang Kim (ISU) Part 2 74 / 181

79 4. Parametric Fractional Imputation EM algorithm by fractional imputation 1 Imputation-step: generate y (j) mis,i h (y i,mis ). 2 Weighting-step: compute where m j=1 w ij(t) = 1. 3 M-step: update w ij(t) f (y ij, δ i ; ˆη (t) )/h(y (j) i,mis ) ˆη (t+1) : solution to n m j=1 w ij(t)s ( η; y ij, δ i ) = 0. 4 Repeat Step2 and Step 3 until convergence. Imputation Step + Weighting Step = E-step. We may add an optional step that checks if w ij(t) is too large for some j. In this case, h(y i,mis ) needs to be changed. Jae-Kwang Kim (ISU) Part 2 75 / 181

80 4. Parametric Fractional imputation The imputed values are not changed for each EM iteration. Only the fractional weights are changed. 1 Computationally efficient (because we use importance sampling only once). 2 Convergence is achieved (because the imputed values are not changed). For sufficiently large t, ˆη (t) ˆη. Also, for sufficiently large m, ˆη ˆη MLE. For estimation of ψ in E{U(ψ; Y )} = 0, simply use 1 n n m j=1 w ij U(ψ; y ij ) = 0. Linearization variance estimation (using the result of Theorem 2) is discussed in Kim (2011). Jae-Kwang Kim (ISU) Part 2 76 / 181

81 4. Parametric Fractional imputation Return to Example 1 Fractional imputation 1 Imputation Step: Generate y (1) i,, y (m) i from f (y i x i ; ˆθ ) (0). 2 Weighting Step: Using the m imputed values generated from Step 1, compute the fractional weights by ( f y (j) wij(t) i x i ; ˆθ ) (t) { ( f y (j) i x i ; ˆθ ) 1 π(x i, y (j) i ; ˆφ } (t) ) (0) where ( π(x i, y i ; ˆφ) exp ˆφ0 + ˆφ 1 x i + ˆφ ) 2 y i = ( 1 + exp ˆφ0 + ˆφ 1 x i + ˆφ ). 2 y i Jae-Kwang Kim (ISU) Part 2 77 / 181

82 4. Parametric Fractional imputation Return to Example 1 (Cont d) Using the imputed data and the fractional weights, the M-step can be implemented by solving and n j=1 n m ( ) wij(t)s θ; x i, y (j) i = 0 j=1 m { } ( ) w ij(t) δ i π(φ; x i, y (j) i ) 1, x i, y (j) i = 0, (7) where S (θ; x i, y i ) = log f (y i x i ; θ)/ θ. Jae-Kwang Kim (ISU) Part 2 78 / 181

83 4. Parametric Fractional imputation Example 3: Categorical missing data Original data i Y 1i Y 2i X i ? 4 3? ?? 8 Jae-Kwang Kim (ISU) Part 2 79 / 181

84 4. Parametric Fractional imputation Example 3 (Con d) Y 1 and Y 2 are dichotomous and X is continuous. Model Pr (Y 1 = 1 X ) = Λ (α 0 + α 1X ) Pr (Y 2 = 1 X, Y 1) = Λ (β 0 + β 1X + β 2Y 1) where Λ (x) = {1 + exp ( x)} 1 Assume MAR Jae-Kwang Kim (ISU) Part 2 80 / 181

85 4. Parametric Fractional imputation Example 3 (Con d) Imputed data i Fractional Weight Y 1i Y 2i X i Pr (Y 2 = 0 Y 1 = 1, X = 4) Pr (Y 2 = 1 Y 1 = 1, X = 4) Pr (Y 1 = 0 Y 2 = 0, X = 5) Pr (Y 1 = 1 Y 2 = 0, X = 5) Pr (Y 1 = 0, Y 1 = 0 X = 8) Pr (Y 1 = 0, Y 1 = 1 X = 8) Pr (Y 1 = 1, Y 1 = 0 X = 8) Pr (Y 1 = 1, Y 1 = 1 X = 8) Jae-Kwang Kim (ISU) Part 2 81 / 181

86 4. Parametric Fractional imputation Example 3 (Con d) Implementation of EM algorithm using fractional imputation E-step: compute the mean score functions using the fractional weights. M-step: solve the mean score function. Because we have a completed data with weights, we can estimate other parameters such as θ = Pr (Y 2 = 1 X > 5). Jae-Kwang Kim (ISU) Part 2 82 / 181

87 4. Parametric Fractional imputation Example 4: Measurement error model [Example 4.13 of KS] Interested in estimating θ in f (y x; θ). Instead of observing x, we observe z which can be highly correlated with x. Thus, z is an instrumental variable for x: and for a b. f (y x, z) = f (y x) f (y z = a) f (y z = b) In addition to original sample, we have a separate calibration sample that observes (x i, z i ). Jae-Kwang Kim (ISU) Part 2 83 / 181

88 4. Parametric Fractional imputation Table : Data Structure Z X Y Calibration Sample o o Original Sample o o Jae-Kwang Kim (ISU) Part 2 84 / 181

89 4. Parametric Fractional imputation Example 4 (Cont d) The goal is to generate x in the original sample from f (x i z i, y i ) f (x i z i ) f (y i x i, z i ) = f (x i z i ) f (y i x i ) Obtain a consistent estimator ˆf (x z) from calibration sample. E-step 1 Generate x (1) i,, x (m) i from ˆf (x i z i ). 2 Compute the fractional weights associated with x (j) i w ij f (y i x (j) ; ˆθ) M-step: Solve the weighted score equation for θ. i by Jae-Kwang Kim (ISU) Part 2 85 / 181

90 5. Multiple imputation Features 1 Imputed values are generated from y (j) i,mis f (y i,mis y i,obs, δ i ; η ) where η is generated from the posterior distribution π (η y obs ). 2 Variance estimation formula is simple (Rubin s formula). ˆV MI ( ψ m) = W m + (1 + 1 m )Bm where W M = m 1 m k=1 ˆV I (k), B m = (m 1) 1 m k=1 ( ˆψ(k) ψ m ) 2, ψ m = m 1 m k=1 ˆψ (k) is the average of m imputed estimators, and ˆV I (k) is the imputed version of the variance estimator of ˆψ under complete response. Jae-Kwang Kim (ISU) Part 2 86 / 181

91 5. Multiple imputation The computation for Bayesian imputation can be implemented by the data augmentation (Tanner and Wong, 1987) technique, which is a special application of the Gibb s sampling method: 1 I-step: Generate y mis f (y mis y obs, δ; η ) 2 P-step: Generate η π (η y obs, y mis, δ) Needs some tools for checking the convergence to a stable distribution. Consistency of variance estimator is questionable (Kim et al., 2006). Jae-Kwang Kim (ISU) Part 2 87 / 181

92 6. Simulation Study Simulation 1 Bivariate data (x i, y i ) of size n = 200 with x i N(3, 1) y i N( 2 + x i, 1) x i always observed, y i subject to missingness. MCAR (δ Bernoulli(0.6)) Parameters of interest 1 θ 1 = E(Y ) 2 θ 2 = Pr(Y < 1) Multiple imputation (MI) and fractional imputation (FI) are applied with m = 50. For estimation of θ 2, the following method-of-moment estimator is used. ˆθ 2,MME = n 1 n I (y i < 1) Jae-Kwang Kim (ISU) Part 2 88 / 181

93 6. Simulation Study Table 1 Monte Carlo bias and variance of the point estimators. Parameter Estimator Bias Variance Std Var Complete sample θ 1 MI FI Complete sample θ 2 MI FI Table 2 Monte Carlo relative bias of the variance estimator. Parameter Imputation Relative bias (%) V (ˆθ 1) MI FI 1.21 V (ˆθ 2) MI FI 2.05 Jae-Kwang Kim (ISU) Part 2 89 / 181

94 6. Simulation study Rubin s formula is based on the following decomposition: V (ˆθ MI ) = V (ˆθ n) + V (ˆθ MI ˆθ n) where ˆθ n is the complete-sample estimator of θ. Basically, W m term estimates V (ˆθ n) and (1 + m 1 )B m term estimates V (ˆθ MI ˆθ n). For general case, we have V (ˆθ MI ) = V (ˆθ n) + V (ˆθ MI ˆθ n) + 2Cov(ˆθ MI ˆθ n, ˆθ n) and Rubin s variance estimator ignores the covariance term. Thus, a sufficient condition for the validity of unbiased variance estimator is Cov(ˆθ MI ˆθ n, ˆθ n) = 0. Meng (1994) called the condition congeniality of ˆθ n. Congeniality holds when ˆθ n is the MLE of θ. Jae-Kwang Kim (ISU) Part 2 90 / 181

95 6. Simulation study For example, there are two estimators of θ = P(Y < 1) when Y follows from N(µ, σ 2 ). 1 Maximum likelihood method: ˆθ MLE = 1 φ(z; ˆµ, ˆσ2 )dz 2 Method of moments: ˆθ MME = n 1 n I (y i < 1) In the simulation setup, the imputed estimator of θ 2 can be expressed as n ˆθ 2,I = n 1 [δ i I (y i < 1) + (1 δ i )E{I (y i < 1) x i ; ˆµ, ˆσ}]. Thus, imputed estimator of θ 2 borrows strength by making use of extra information associated with f (y x). Thus, when the congeniality conditions does not hold, the imputed estimator improves the efficiency (due to the imputation model that uses extra information) but the variance estimator does not recognize this improvement. Jae-Kwang Kim (ISU) Part 2 91 / 181

96 6. Simulation Study Simulation 2 Bivariate data (x i, y i ) of size n = 100 with ( ) Y i = β 0 + β 1x i + β 2 xi e i (8) where (β 0, β 1, β 2) = (0, 0.9, 0.06), x i N (0, 1), e i N (0, 0.16), and x i and e i are independent. The variable x i is always observed but the probability that y i responds is 0.5. In MI, the imputer s model is Y i = β 0 + β 1x i + e i. That is, imputer s model uses extra information of β 2 = 0. From the imputed data, we fit model (8) and computed power of a test H 0 : β 2 = 0 with 0.05 significant level. In addition, we also considered the Complete-Case (CC) method that simply uses the complete cases only for the regression analysis Jae-Kwang Kim (ISU) Part 2 92 / 181

97 6. Simulation Study Table 3 Simulation results for the Monte Carlo experiment based on 10,000 Monte Carlo samples. Method E(ˆθ) V (ˆθ) R.B. ( ˆV ) Power MI FI CC Table 3 shows that MI provides efficient point estimator than CC method but variance estimation is very conservative (more than 100% overestimation). Because of the serious positive bias of MI variance estimator, the statistical power of the test based on MI is actually lower than the CC method. Jae-Kwang Kim (ISU) Part 2 93 / 181

98 7. Summary Imputation can be viewed as a Monte Carlo tool for computing the conditional expectation. Monte Carlo EM is very popular but the E-step can be computationally heavy. Parametric fractional imputation is a useful tool for frequentist imputation. Multiple imputation is motivated from a Bayesian framework. The frequentist validity of multiple imputation requires the condition of congeniality. Uncongeniality may lead to overestimation of variance which can seriously increase type-2 errors. Jae-Kwang Kim (ISU) Part 2 94 / 181

99 REFERENCES Cheng, P. E. (1994), Nonparametric estimation of mean functionals with data missing at random, Journal of the American Statistical Association 89, Dempster, A. P., N. M. Laird and D. B. Rubin (1977), Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B 39, Fisher, R. A. (1922), On the mathematical foundations of theoretical statistics, Philosophical Transactions of the Royal Society of London A 222, Fuller, W. A., M. M. Loughin and H. D. Baker (1994), Regression weighting in the presence of nonresponse with application to the Nationwide Food Consumption Survey, Survey Methodology 20, Hirano, K., G. Imbens and G. Ridder (2003), Efficient estimation of average treatment effects using the estimated propensity score, Econometrica 71, Ibrahim, J. G. (1990), Incomplete data in generalized linear models, Journal of the American Statistical Association 85, Kim, J. K. (2011), Parametric fractional imputation for missing data analysis, Biometrika 98, Kim, J. K. and C. L. Yu (2011), A semi-parametric estimation of mean functionals with non-ignorable missing data, Journal of the American Statistical Association 106, Jae-Kwang Kim (ISU) Part 2 94 / 181

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Miscellanea A note on multiple imputation under complex sampling

Miscellanea A note on multiple imputation under complex sampling Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

A weighted simulation-based estimator for incomplete longitudinal data models

A weighted simulation-based estimator for incomplete longitudinal data models To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun

More information

Missing Data: Theory & Methods. Introduction

Missing Data: Theory & Methods. Introduction STAT 992/BMI 826 University of Wisconsin-Madison Missing Data: Theory & Methods Chapter 1 Introduction Lu Mao lmao@biostat.wisc.edu 1-1 Introduction Statistical analysis with missing data is a rich and

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Statistica Sinica 24 (2014), 1097-1116 doi:http://dx.doi.org/10.5705/ss.2012.074 AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Sheng Wang 1, Jun Shao

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Propensity score adjusted method for missing data

Propensity score adjusted method for missing data Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

arxiv:math/ v1 [math.st] 23 Jun 2004

arxiv:math/ v1 [math.st] 23 Jun 2004 The Annals of Statistics 2004, Vol. 32, No. 2, 766 783 DOI: 10.1214/009053604000000175 c Institute of Mathematical Statistics, 2004 arxiv:math/0406453v1 [math.st] 23 Jun 2004 FINITE SAMPLE PROPERTIES OF

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Fractional imputation method of handling missing data and spatial statistics

Fractional imputation method of handling missing data and spatial statistics Graduate Theses and Dissertations Graduate College 2014 Fractional imputation method of handling missing data and spatial statistics Shu Yang Iowa State University Follow this and additional works at:

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Introduction to Survey Data Integration

Introduction to Survey Data Integration Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

STATISTICAL INFERENCE WITH DATA AUGMENTATION AND PARAMETER EXPANSION

STATISTICAL INFERENCE WITH DATA AUGMENTATION AND PARAMETER EXPANSION STATISTICAL INFERENCE WITH arxiv:1512.00847v1 [math.st] 2 Dec 2015 DATA AUGMENTATION AND PARAMETER EXPANSION Yannis G. Yatracos Faculty of Communication and Media Studies Cyprus University of Technology

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

Estimation, Inference, and Hypothesis Testing

Estimation, Inference, and Hypothesis Testing Chapter 2 Estimation, Inference, and Hypothesis Testing Note: The primary reference for these notes is Ch. 7 and 8 of Casella & Berger 2. This text may be challenging if new to this topic and Ch. 7 of

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

Graybill Conference Poster Session Introductions

Graybill Conference Poster Session Introductions Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Inferences on missing information under multiple imputation and two-stage multiple imputation

Inferences on missing information under multiple imputation and two-stage multiple imputation p. 1/4 Inferences on missing information under multiple imputation and two-stage multiple imputation Ofer Harel Department of Statistics University of Connecticut Prepared for the Missing Data Approaches

More information

Estimation of the Conditional Variance in Paired Experiments

Estimation of the Conditional Variance in Paired Experiments Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Likelihood inference in the presence of nuisance parameters

Likelihood inference in the presence of nuisance parameters Likelihood inference in the presence of nuisance parameters Nancy Reid, University of Toronto www.utstat.utoronto.ca/reid/research 1. Notation, Fisher information, orthogonal parameters 2. Likelihood inference

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona Likelihood based Statistical Inference Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona L. Pace, A. Salvan, N. Sartori Udine, April 2008 Likelihood: observed quantities,

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables. Roderick J. Little Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Applied Asymptotics Case studies in higher order inference

Applied Asymptotics Case studies in higher order inference Applied Asymptotics Case studies in higher order inference Nancy Reid May 18, 2006 A.C. Davison, A. R. Brazzale, A. M. Staicu Introduction likelihood-based inference in parametric models higher order approximations

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

Bayesian inference for multivariate extreme value distributions

Bayesian inference for multivariate extreme value distributions Bayesian inference for multivariate extreme value distributions Sebastian Engelke Clément Dombry, Marco Oesting Toronto, Fields Institute, May 4th, 2016 Main motivation For a parametric model Z F θ of

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness Information in Data Sufficiency, Ancillarity, Minimality, and Completeness Important properties of statistics that determine the usefulness of those statistics in statistical inference. These general properties

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information