Statistical Methods for Handling Missing Data
|
|
- Russell Leonard
- 6 years ago
- Views:
Transcription
1 Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014
2 Outline Textbook : Statistical Methods for handling incomplete data by Kim and Shao (2013) Part 1: Basic Theory (Chapter 2-3) Part 2: Imputation (Chapter 4) Part 3: Propensity score approach (Chapter 5) Part 4: Nonignorable missing (Chapter 6) Jae-Kwang Kim (ISU) July 5th, / 181
3 Statistical Methods for Handling Missing Data Part 1: Basic Theory Jae-Kwang Kim Department of Statistics, Iowa State University
4 1 Introduction Definitions for likelihood theory The likelihood function of θ is defined as where f (y; θ) is the joint pdf of y. L(θ) = f (y; θ) Let ˆθ be the maximum likelihood estimator (MLE) of θ 0 if it satisfies L(ˆθ) = max θ Θ L(θ). A parametric family of densities, P = {f (y; θ); θ Θ}, is identifiable if for all y, f (y; θ 1) f (y; θ 2) for every θ 1 θ 2. Jae-Kwang Kim (ISU) Part 1 4 / 181
5 1 Introduction - Fisher information Definition 1 Score function: S(θ) = log L(θ) θ 2 Fisher information = curvature of the log-likelihood: I(θ) = 2 log L(θ) = θ θt θ S(θ) T 3 Observed (Fisher) information: I(ˆθ n) where ˆθ n is the MLE. 4 Expected (Fisher) information: I(θ) = E θ {I(θ)} The observed information is always positive. The observed information applies to a single dataset. The expected information is meaningful as a function of θ across the admissible values of θ. The expected information is an average quantity over all possible datasets. I(ˆθ) = I(ˆθ) for exponential family. Jae-Kwang Kim (ISU) Part 1 5 / 181
6 1 Introduction - Fisher information Lemma 1. Properties of score functions [Theorem 2.3 of KS] Under regularity conditions allowing the exchange of the order of integration and differentiation, E θ {S(θ)} = 0 and V θ {S(θ)} = I(θ). Jae-Kwang Kim (ISU) Part 1 6 / 181
7 1 Introduction - Fisher information Remark Under some regularity conditions, the MLE ˆθ converges in probability to the true parameter θ 0. Thus, we can apply a Taylor linearization on S(ˆθ) = 0 to get ˆθ θ 0 = {I(θ0)} 1 S(θ 0). Here, we use the fact that I (θ) = S(θ)/ θ T converges in probability to I(θ). Thus, the (asymptotic) variance of MLE is V (ˆθ). = {I(θ 0)} 1 V {S(θ 0)} {I(θ 0)} 1 = {I(θ 0)} 1, where the last equality follows from Lemma 1. Jae-Kwang Kim (ISU) Part 1 7 / 181
8 2 Observed Likelihood Basic Setup Let y = (y 1,..., y p) be a p-dimensional random vector with probability density function f (y; θ) whose dominating measure is µ. { 1 if yij is observed Let δ ij be the response indicator function of y ij with δ ij = 0 otherwise. δ i = (δ i1,, δ ip ): p-dimensional random vector with density P(δ y) assuming P(δ y) = P(δ y; φ) for some φ. Let (y i,obs, y i,mis ) be the observed part and missing part of y i, respectively. Let R(y obs, δ) = {y; y obs (y i, δ i ) = y i,obs, i = 1,..., n} be the set of all possible values of y with the same realized value of y obs, for given δ, where y obs (y i, δ i ) is a function that gives the value of y ij for δ ij = 1. Jae-Kwang Kim (ISU) Part 1 8 / 181
9 2 Observed Likelihood Definition: Observed likelihood Under the above setup, the observed likelihood of (θ, φ) is L obs (θ, φ) = f (y; θ)p(δ y; φ)dµ(y). R(y obs,δ) Under IID setup: The observed likelihood is n [ ] L obs (θ, φ) = f (y i ; θ)p(δ i y i ; φ)dµ(y i,mis ), where it is understood that, if y i = y i,obs and y i,mis is empty then there is nothing to integrate out. In the special case of scalar y, the observed likelihood is L obs (θ, φ) = [f (y i ; θ)π(y i ; φ)] [ ] f (y; θ) {1 π(y; φ)} dy, δ i =1 δ i =0 where π(y; φ) = P(δ = 1 y; φ). Jae-Kwang Kim (ISU) Part 1 9 / 181
10 2 Observed Likelihood Example 1 [Example 2.3 of KS] Let t 1, t 2,, t n be an IID sample from a distribution with density f θ (t) = θe θt I (t > 0). Instead of observing t i, we observe (y i, δ i ) where { ti if δ i = 1 y i = c if δ i = 0 and δ i = { 1 if ti c 0 if t i > c, where c is a known censoring time. The observed likelihood for θ can be derived as L obs (θ) = n [ ] {f θ (t i )} δ i {P (t i > c)} 1 δ i = θ n δ i exp( θ n y i ). Jae-Kwang Kim (ISU) Part 1 10 / 181
11 2 Observed Likelihood Definition: Missing At Random (MAR) P(δ y) is the density of the conditional distribution of δ given y. Let y obs = y obs (y, δ) where { yi if δ i = 1 y i,obs = if δ i = 0. The response mechanism is MAR if P (δ y 1 ) = P (δ y 2 ) { or P(δ y) = P(δ y obs )} for all y 1 and y 2 satisfying y obs (y 1, δ) = y obs (y 2, δ). MAR: the response mechanism P(δ y) depends on y only through y obs. Let y = (y obs, y mis ). By Bayes theorem, P (y mis y obs, δ) = P(δ y mis, y obs ) P (y P(δ y obs ) mis y obs ). MAR: P(y mis y obs, δ) = P(y mis y obs ). That is, y mis δ y obs. MAR: the conditional independence of δ and y mis given y obs. Jae-Kwang Kim (ISU) Part 1 11 / 181
12 2 Observed Likelihood Remark MCAR (Missing Completely at random): P(δ y) does not depend on y. MAR (Missing at random): P(δ y) = P(δ y obs ) NMAR (Not Missing at random): P(δ y) P(δ y obs ) Thus, MCAR is a special case of MAR. Jae-Kwang Kim (ISU) Part 1 12 / 181
13 2 Observed Likelihood Theorem 1: Likelihood factorization (Rubin, 1976) [Theorem 2.4 of KS] P φ (δ y) is the joint density of δ given y and f θ (y) is the joint density of y. Under conditions 1 the parameters θ and φ are distinct and 2 MAR condition holds, the observed likelihood can be written as L obs (θ, φ) = L 1(θ)L 2(φ), and the MLE of θ can be obtained by maximizing L 1(θ). Thus, we do not have to specify the model for response mechanism. The response mechanism is called ignorable if the above likelihood factorization holds. Jae-Kwang Kim (ISU) Part 1 13 / 181
14 2 Observed Likelihood Example 2 [Example 2.4 of KS] Bivariate data (x i, y i ) with pdf f (x, y) = f 1(y x)f 2(x). x i is always observed and y i is subject to missingness. Assume that the response status variable δ i of y i satisfies for some function Λ 1 ( ) of known form. P (δ i = 1 x i, y i ) = Λ 1 (φ 0 + φ 1x i + φ 2y i ) Let θ be the parameter of interest in the regression model f 1 (y x; θ). Let α be the parameter in the marginal distribution of x, denoted by f 2 (x i ; α). Define Λ 0(x) = 1 Λ 1(x). Three parameters θ: parameter of interest α and φ: nuisance parameter Jae-Kwang Kim (ISU) Part 1 14 / 181
15 2 Observed Likelihood Example 2 (Cont d) Observed likelihood L obs (θ, α, φ) = f 1 (y i x i ; θ) f 2 (x i ; α) Λ 1 (φ 0 + φ 1x i + φ 2y i ) δi =1 δi =0 f 1 (y x i ; θ) f 2 (x i ; α) Λ 0 (φ 0 + φ 1x i + φ 2y) dy = L 1 (θ, φ) L 2 (α) where L 2 (α) = n f2 (x i; α). Thus, we can safely ignore the marginal distribution of x if x is completely observed. Jae-Kwang Kim (ISU) Part 1 15 / 181
16 2 Observed Likelihood Example 2 (Cont d) If φ 2 = 0, then MAR holds and L 1 (θ, φ) = L 1a (θ) L 1b (φ) where and L 1a (θ) = f 1 (y i x i ; θ) δ i =1 L 1b (φ) = Λ 1 (φ 0 + φ 1x i ) Λ 0 (φ 0 + φ 1x i ). δ i =1 δ i =0 Thus, under MAR, the MLE of θ can be obtained by maximizing L 1a (θ), which is obtained by ignoring the missing part of the data. Jae-Kwang Kim (ISU) Part 1 16 / 181
17 2 Observed Likelihood Example 2 (Cont d) Instead of y i subject to missingness, if x i is subject to missingness, then the observed likelihood becomes L obs (θ, φ, α) = f 1 (y i x i ; θ) f 2 (x i ; α) Λ 1 (φ 0 + φ 1x i + φ 2y i ) δi =1 δi =0 f 1 (y i x; θ) f 2 (x; α) Λ 0 (φ 0 + φ 1x + φ 2y i ) dx If φ 1 = 0 then L 1 (θ, φ) L 2 (α). L obs (θ, α, φ) = L 1 (θ, α) L 2 (φ) and MAR holds. Although we are not interested in the marginal distribution of x, we still need to specify the model for the marginal distribution of x. Jae-Kwang Kim (ISU) Part 1 17 / 181
18 3 Mean Score Approach The observed likelihood is the marginal density of (y obs, δ). The observed likelihood is L obs (η) = f (y; θ)p(δ y; φ)dµ(y) = R(y obs,δ) f (y; θ)p(δ y; φ)dµ(y mis ) where y mis is the missing part of y and η = (θ, φ). Observed score equation: S obs (η) η log L obs(η) = 0 Computing the observed score function can be computationally challenging because the observed likelihood is an integral form. Jae-Kwang Kim (ISU) Part 1 18 / 181
19 3 Mean Score Approach Theorem 2: Mean Score Theorem (Fisher, 1922) [Theorem 2.5 of KS] Under some regularity conditions, the observed score function equals to the mean score function. That is, S obs (η) = S(η) where S(η) = E{S com(η) y obs, δ} S com(η) = log f (y, δ; η), η f (y, δ; η) = f (y; θ)p(δ y; φ). The mean score function is computed by taking the conditional expectation of the complete-sample score function given the observation. The mean score function is easier to compute than the observed score function. Jae-Kwang Kim (ISU) Part 1 19 / 181
20 3 Mean Score Approach Proof of Theorem 2 Since L obs (η) = f (y, δ; η) /f (y, δ y obs, δ; η), we have η ln L obs (η) = ln f (y, δ; η) η η ln f (y, δ y obs, δ; η), taking a conditional expectation of the above equation over the conditional distribution of (y, δ) given (y obs, δ), we have { } η ln L obs (η) = E η ln L obs (η) y obs, δ { } = E {S com (η) y obs, δ} E η ln f (y, δ y obs, δ; η) y obs, δ. Here, the first equality holds because L obs (η) is a function of (y obs, δ) only. The last term is equal to zero by Lemma 1, which states that the expected value of the score function is zero and the reference distribution in this case is the conditional distribution of (y, δ) given (y obs, δ). Jae-Kwang Kim (ISU) Part 1 20 / 181
21 3 Mean Score Approach Example 3 [Example 2.6 of KS] 1 Suppose that the study variable y follows from a normal distribution with mean x β and variance σ 2. The score equations for β and σ 2 under complete response are S 1(β, σ 2 ) = n ( yi x i β ) x i /σ 2 = 0 and S 2(β, σ 2 ) = n/(2σ 2 ) + n ( yi x i β )2 /(2σ 4 ) = 0. 2 Assume that y i are observed only for the first r elements and the MAR assumption holds. In this case, the mean score function reduces to and S 1(β, σ 2 ) = S 2(β, σ 2 ) = n/(2σ 2 ) + r ( yi x i β ) x i /σ 2 r ( yi x i β )2 /(2σ 4 ) + (n r)/(2σ 2 ). Jae-Kwang Kim (ISU) Part 1 21 / 181
22 3 Mean Score Approach Example 3 (Cont d) 3 The maximum likelihood estimator obtained by solving the mean score equations is ( r ) 1 r ˆβ = x i x i x i y i and ˆσ 2 = 1 r r ( y i x i ˆβ) 2. Thus, the resulting estimators can be also obtained by simply ignoring the missing part of the sample, which is consistent with the result in Example 2 (for φ 2 = 0). Jae-Kwang Kim (ISU) Part 1 22 / 181
23 3 Mean Score Approach Discussion of Example 3 We are interested in estimating θ for the conditional density f (y x; θ). Under MAR, the observed likelihood for θ is r n L obs (θ) = f (y i x i ; θ) f (y x i ; θ)dµ(y) = i=r+1 r f (y i x i ; θ). The same conclusion can follow from the mean score theorem. Under MAR, the mean score function is r n S(θ) = S(θ; x i, y i ) + E{S(θ; x i, Y ) x i } = r S(θ; x i, y i ) i=r+1 where S(θ; x, y) is the score function for θ and the second equality follows from Lemma 1. Jae-Kwang Kim (ISU) Part 1 23 / 181
24 3 Mean Score Approach Example 4 [Example 2.5 of KS] 1 Suppose that the study variable y is randomly distributed with Bernoulli distribution with probability of success p i, where p i = p i (β) = exp (x i β) 1 + exp (x i β) for some unknown parameter β and x i is a vector of the covariates in the logistic regression model for y i. We assume that 1 is in the column space of x i. 2 Under complete response, the score function for β is S 1(β) = n (y i p i (β)) x i. Jae-Kwang Kim (ISU) Part 1 24 / 181
25 3 Mean Score Approach Example 4 (Cont d) 3 Let δ i be the response indicator function for y i with distribution Bernoulli(π i ) where π i = exp (x i φ 0 + y i φ 1) 1 + exp (x i φ0 + y iφ 1). We assume that x i is always observed, but y i is missing if δ i = 0. 4 Under missing data, the mean score function for β is S 1 (β, φ) = δi =1 {y i p i (β)} x i + δ i =0 1 w i (y; β, φ) {y p i (β)} x i, where w i (y; β, φ) is the conditional probability of y i = y given x i and δ i = 0: y=0 w i (y; β, φ) = P β (y i = y x i ) P φ (δ i = 0 y i = y, x i ) 1 z=0 P β (y i = z x i ) P φ (δ i = 0 y i = z, x i ) Thus, S 1 (β, φ) is also a function of φ. Jae-Kwang Kim (ISU) Part 1 25 / 181
26 3 Mean Score Approach Example 4 (Cont d) 5 If the response mechanism is MAR so that φ 1 = 0, then w i (y; β, φ) = P β (y i = y x i ) 1 z=0 P β (y i = z x i ) = P β (y i = y x i ) and so S 1 (β, φ) = δi =1 {y i p i (β)} x i = S 1 (β). 6 If MAR does not hold, then ( ˆβ, ˆφ) can be obtained by solving S 1 (β, φ) = 0 and S 2(β, φ) = 0 jointly, where S 2 (β, φ) = δi =1 {δ i π (φ; x i, y i )} (x i, y i ) + δ i =0 1 w i (y; β, φ) {δ i π i (φ; x i, y)} (x i, y). y=0 Jae-Kwang Kim (ISU) Part 1 26 / 181
27 3 Mean Score Approach Discussion of Example 4 We may not have a unique solution to S(η) = 0, where S(η) = [ S1 (β, φ), S 2(β, φ) ] when MAR does not hold, because of the non-identifiability problem associated with non-ignorable missing. To avoid this problem, often a reduced model is used for the response model. Pr (δ = 1 x, y) = Pr (δ = 1 u, y) where x = (u, z). The reduced response model introduces a smaller set of parameters and the over-identified situation can be resolved. (More discussion will be made in Part 4 lecture.) Computing the solution to S(η) is also difficult. EM algorithm, which will be presented soon, is a useful computational tool. Jae-Kwang Kim (ISU) Part 1 27 / 181
28 4 Observed information Definition 1 Observed score function: S obs (η) = η log L obs(η) 2 Fisher information from observed likelihood: I obs (η) = 2 log L η η T obs (η) 3 Expected (Fisher) information from observed likelihood: I obs (η) = E η {I obs (η)}. Lemma 2 [Theorem 2.6 of KS] Under regularity conditions, E{S obs (η)} = 0, and V{S obs (η)} = I obs (η), where I obs (η) = E η{i obs (η)} is the expected information from the observed likelihood. Jae-Kwang Kim (ISU) Part 1 28 / 181
29 4 Observed information Under missing data, the MLE ˆη is the solution to S(η) = 0. Under some regularity conditions, ˆη converges in probability to η 0 and has the asymptotic variance {I obs (η 0)} 1 with { I obs (η) = E } { } } η S obs(η) = E S 2 T obs (η) = E { S 2 (η) & B 2 = BB T. For variance estimation of ˆη, may use {I obs (ˆη)} 1. IID setup: The empirical (Fisher) information for the variance of ˆη is [Ĥ(ˆη) ] 1 = { n S 2 i (ˆη) } 1 where S i (η) = E{S i (η) y i,obs, δ i } (Redner and Walker, 1984). In general, I obs (ˆη) is preferred to I obs (ˆη) for variance estimation of ˆη. Jae-Kwang Kim (ISU) Part 1 29 / 181
30 4 Observed information Return to Example 1 Observed score function MLE for θ: S obs (θ) = n δ i log(θ) θ ˆθ = n y i n δ i n y i Fisher information: I obs (θ) = n δ i/θ 2 Expected information: I obs (θ) = n (1 e θc )/θ 2 = n(1 e θc )/θ 2. Which one do you prefer? Jae-Kwang Kim (ISU) Part 1 30 / 181
31 4 Observed information Motivation L com(η) = f (y, δ; η): complete-sample likelihood with no missing data Fisher information associated with L com(η): I com(η) = η Scom(η) = 2 log Lcom(η) T η ηt L obs (η): the observed likelihood Fisher information associated with L obs (η): I obs (η) = η S obs(η) = 2 T η η log L obs(η) T How to express I obs (η) in terms of I com(η) and S com(η)? Jae-Kwang Kim (ISU) Part 1 31 / 181
32 4 Observed information Theorem 3 (Louis, 1982; Oakes, 1999) [Theorem 2.7 of KS] Under regularity conditions allowing the exchange of the order of integration and differentiation, [ I obs (η) = E{I com(η) y obs, δ} E{Scom(η) y 2 obs, δ} S(η) 2] where S(η) = E{S com(η) y obs, δ}. = E{I com(η) y obs, δ} V {S com(η) y obs, δ}, Jae-Kwang Kim (ISU) Part 1 32 / 181
33 Proof of Theorem 3 By Theorem 2, the observed information associated with L obs (η) can be expressed as I obs (η) = η S (η) where S (η) = E {S com(η) y obs, δ; η}. Thus, we have η S (η) = S com(η; y)f (y, δ y η obs, δ; η) dµ(y) { } = Scom(η; y) f (y, δ y η obs, δ; η) dµ(y) { } + S com(η; y) η f (y, δ y obs, δ; η) dµ(y) = E { S com(η)/ η y obs, δ } { } + S com(η; y) η log f (y, δ y obs, δ; η) f (y, δ y obs, δ; η) dµ(y). The first term is equal to E {I com(η) y obs, δ} and the second term is equal to E { S com(η)s mis(η) y obs, δ } = E [{ S(η) } + Smis(η) S mis(η) y obs, δ ] = E { S mis(η)s mis(η) y obs, δ } because E { S(η)Smis(η) y obs, δ } = 0. Jae-Kwang Kim (ISU) Part 1 33 / 181
34 4. Observed information S mis (η): the score function with the conditional density f (y, δ y obs, δ). { } Expected missing information: I mis (η) = E S η T mis (η) I mis (η) = E { S mis (η) 2}. Missing information principle (Orchard and Woodbury, 1972): I mis (η) = I com(η) I obs (η), satisfying where I com(η) = E { S com(η)/ η T } is the expected information with complete-sample likelihood. An alternative expression of the missing information principle is V{S mis (η)} = V{S com(η)} V{ S(η)}. Note that V{S com(η)} = I com(η) and V{S obs (η)} = I obs (η). Jae-Kwang Kim (ISU) Part 1 34 / 181
35 4. Observed information Example 5 1 Consider the following bivariate normal distribution: ( ) [( ) ( y1i µ1 σ11 σ 12 N, µ 2 σ 12 σ 22 y 2i )], for i = 1, 2,, n. Assume for simplicity that σ 11, σ 12 and σ 22 are known constants and µ = (µ 1, µ 2) be the parameter of interest. 2 The complete sample score function for µ is n n ( ) 1 ( S com (µ) = S com (i) σ11 σ 12 y1i µ 1 (µ) = σ 12 σ 22 y 2i µ 2 The information matrix of µ based on the complete sample is ( ) 1 σ11 σ 12 I com (µ) = n. σ 12 σ 22 ). Jae-Kwang Kim (ISU) Part 1 35 / 181
36 4. Observed information Example 5 (Cont d) 3 Suppose that there are some missing values in y 1i and y 2i and the original sample is partitioned into four sets: H = both y 1 and y 2 respond K = only y 1 is observed L = only y 2 is observed M = both y 1 and y 2 are missing. Let n H, n K, n L, n M represent the size of H, K, L, M, respectively. 4 Assume that the response mechanism does not depend on the value of (y 1, y 2) and so it is MAR. In this case, the observed score function of µ based on a single observation in set K is E { } S com (i) (µ) y 1i, i K ( ) 1 ( ) σ11 σ 12 y1i µ 1 = σ 12 σ 22 E(y 2i y 1i ) µ 2 ( σ 1 11 = (y ) 1i µ 1). 0 Jae-Kwang Kim (ISU) Part 1 36 / 181
37 4. Observed information Example 5 (Cont d) 5 Similarly, we have E { } S com (i) (µ) y 2i, i L = ( 0 σ 1 22 (y 2i µ 2) ). 6 Therefore, and the observed information matrix of µ is I obs (µ) = n H ( σ11 σ 12 σ 12 σ 22 ) 1 + n K ( σ ) + n L ( σ 1 22 and the asymptotic variance of the MLE of µ can be obtained by the inverse of I obs (µ). ) Jae-Kwang Kim (ISU) Part 1 37 / 181
38 5. EM algorithm Interested in finding ˆη that maximizes L obs (η). The MLE can be obtained by solving S obs (η) = 0, which is equivalent to solving S(η) = 0 by Theorem 2. Computing the solution S(η) = 0 can be challenging because it often involves computing I obs (η) = S(η)/ η in order to apply Newton method: { 1 ˆη (t+1) = ˆη (t) + I obs (ˆη )} (t) S(ˆη (t) ). We may rely on Louis formula (Theorem 3) to compute I obs (η). EM algorithm provides an alternative method of solving S(η) = 0 by writing S(η) = E {S com(η) y obs, δ; η} and using the following iterative method: { ˆη (t+1) solve E S com(η) y obs, δ; ˆη (t)} = 0. Jae-Kwang Kim (ISU) Part 1 38 / 181
39 5. EM algorithm Definition Let η (t) be the current value of the parameter estimate of η. The EM algorithm can be defined as iteratively carrying out the following E-step and M-steps: E-step: Compute ( Q η η (t)) = E {ln f (y, δ; η) y obs, δ, η (t)} M-step: Find η (t+1) that maximizes Q(η η (t) ) w.r.t. η. Theorem 4 (Dempster et al., 1977) [Theorem 3.2] Let L obs (η) = f (y, δ; η) dµ(y) be the observed likelihood of η. If R(y obs,δ) Q(η (t+1) η (t) ) Q(η (t) η (t) ), then L obs (η (t+1) ) L obs (η (t) ). Jae-Kwang Kim (ISU) Part 1 39 / 181
40 5. EM algorithm Remark 1 Convergence of EM algorithm is linear. It can be shown that η (t+1) η (t) = Jmis ( η (t) η (t 1)) where J mis = I 1 comi mis is called the fraction of missing information. 2 Under MAR and for the exponential family of the distribution of the form f (y; θ) = b (y) exp { θ T (y) A (θ) }. The M-step computes θ (t+1) by the solution to { E T (y) y obs, θ (t)} = E {T (y) θ}. Jae-Kwang Kim (ISU) Part 1 40 / 181
41 5. EM algorithm h 1(θ) h 2(θ) θ Figure : Illustration of EM algorithm for exponential family (h 1 (θ) = E {T (y) y obs, θ}, h 2 (θ) = E {T (y) θ}) Jae-Kwang Kim (ISU) Part 1 41 / 181
42 5. EM algorithm Return to Example 4 E-step: where ( S 1 β β (t), φ (t)) = {y i p i (β)} x i + δ i =1 δ i =0 w ij(t) = Pr(Y i = j x i, δ i = 0; β (t), φ (t) ) = 1 w ij(t) {j p i (β)} x i, j=0 Pr(Y i = j x i ; β (t) )Pr(δ i = 0 x i, j; φ (t) ) 1 y=0 Pr(Y i = y x i ; β (t) )Pr(δ i = 0 x i, y; φ (t) ) and ( S 2 φ β (t), φ (t)) = {δ i π (x i, y i ; φ)} ( x ) i, y i δ i =1 + δ i =0 1 w ij(t) {δ i π i (x i, j; φ)} ( x i, j ). j=0 Jae-Kwang Kim (ISU) Part 1 42 / 181
43 5. EM algorithm Return to Example 4 (Cont d) M-step: The parameter estimates are updated by solving [ S 1 (β β (t), φ (t)), S 2 ( φ β (t), φ (t))] = (0, 0) for β and φ. For categorical missing data, the conditional expectation in the E-step can be computed using the weighted mean with weights w ij(t). Ibrahim (1990) called this method EM by weighting. Observed information matrix can also be obtained by the Louis formula (in Theorem 3) using the weighted mean in the E-step. Jae-Kwang Kim (ISU) Part 1 43 / 181
44 5. EM algorithm Example 6. Mixture model [Example 3.8 of KS] Observation where Y i = (1 W i ) Z 1i + W i Z 2i, ( ) Z 1i N µ 1, σ1 2 ( ) Z 2i N µ 2, σ2 2 W i Bernoulli (π). i = 1, 2,, n Parameter of interest θ = ( ) µ 1, µ 2, σ1, 2 σ2, 2 π Jae-Kwang Kim (ISU) Part 1 44 / 181
45 5. EM algorithm Example 6 (Cont d) Observed likelihood where and L obs (θ) = n pdf (y i θ) ( ) ( ) pdf (y θ) = (1 π) φ y µ 1, σ1 2 + πφ y µ 2, σ2 2 ( φ y µ, σ 2) = 1 exp 2πσ [ ] (y µ)2. 2σ 2 Jae-Kwang Kim (ISU) Part 1 45 / 181
46 5. EM algorithm Example 6 (Cont d) Full sample likelihood where pdf (y, w θ) = L full (θ) = n pdf (y i, w i θ) [ ( )] 1 w [ ( w φ y µ 1, σ1 2 φ y µ 2, σ2)] 2 π w (1 π) 1 w. Thus, ln L full (θ) = n + [ ( ) ( )] (1 w i ) ln φ y i µ 1, σ1 2 + w i ln φ y i µ 2, σ2 2 n {w i ln (π) + (1 w i ) ln (1 π)} Jae-Kwang Kim (ISU) Part 1 46 / 181
47 5. EM algorithm Example 6 (Cont d) E-M algorithm [E-step] ( Q θ θ (t)) = where r (t) i n [( 1 r (t) i + n ) = E (w i y i, θ (t) with E (w i y i, θ) = ) ( ) ln φ y i µ 1, σ1 2 { ( r (t) i ln (π) + 1 r (t) i ) ( )] + r (t) i ln φ y i µ 2, σ2 2 } ln (1 π) πφ ( ) y i µ 2, σ2 2 (1 π) φ (y i µ 1, σ1 2) + πφ (y i µ 2, σ2 2) [M-step] ( θ Q θ θ (t)) = 0. Jae-Kwang Kim (ISU) Part 1 47 / 181
48 5. EM algorithm Example 7. (Robust regression) [Example 3.12 of KS] Model: y i = β 0 + β 1x i + σe i with e i t(ν), ν: known. Missing data setup: e i = u i / w i where u i N(0, 1), w i χ 2 /ν. (x i, y i, w i ): complete data (x i, y i always observed, w i always missing) ) y i (x i, w i ) N (β 0 + β 1x i, σ 2 /w i and x i is fixed (thus, independent of w i ). EM algorithm can be used to estimate θ = (β 0, β 1, σ 2 ) Jae-Kwang Kim (ISU) Part 1 48 / 181
49 5. EM algorithm Example 7 (Cont d) E-step: Find the conditional distribution of w i given x i. By Bayes theorem, f (w i x i, y i ) f (w i )f (y i w i, x i ) { Gamma ν + 1 ( ) } 2, 2 yi β 0 β 1 x 2 1 i ν + σ Thus, E(w i x i, y i, θ (t) ) = ν + 1 ( ) 2, ν + d (t) i where d (t) i = (y i β (t) 0 β (t) 1 x i)/σ (t). Jae-Kwang Kim (ISU) July 5th, / 181
50 5. EM algorithm Example 7 (Cont d) M-step: ( µ (t) x ), µ (t) y = n w (t) i (x i, y i ) /( n w (t) i ) β (t+1) 0 = µ (t) y β (t+1) 1 µ (t) x n β (t+1) 1 = w (t) i (x i µ (t) x )(y i µ (t) n w (t) i (x i µ (t) x ) 2 σ 2(t+1) = 1 n ( w (t) i y i β (t) 0 β (t) 1 n x i where w (t) i = E(w i x i, y i, θ (t) ). y ) ) 2 Jae-Kwang Kim (ISU) July 5th, / 181
51 6. Summary Interested in finding the MLE that maximizes the observed likelihood function. Under MAR, the model specification of the response mechanism is not necessary. Mean score equation can be used to compute the MLE. EM algorithm is a useful computational tool for solving the mean score equation. The E-step of the EM algorithm may require some computational tool (see Part 2). The asymptotic variance of the MLE can be computed by the inverse of the observed information matrix, which can be computed using Louis formula. Jae-Kwang Kim (ISU) July 5th, / 181
52 REFERENCES Cheng, P. E. (1994), Nonparametric estimation of mean functionals with data missing at random, Journal of the American Statistical Association 89, Dempster, A. P., N. M. Laird and D. B. Rubin (1977), Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B 39, Fisher, R. A. (1922), On the mathematical foundations of theoretical statistics, Philosophical Transactions of the Royal Society of London A 222, Fuller, W. A., M. M. Loughin and H. D. Baker (1994), Regression weighting in the presence of nonresponse with application to the Nationwide Food Consumption Survey, Survey Methodology 20, Hirano, K., G. Imbens and G. Ridder (2003), Efficient estimation of average treatment effects using the estimated propensity score, Econometrica 71, Ibrahim, J. G. (1990), Incomplete data in generalized linear models, Journal of the American Statistical Association 85, Jae-Kwang Kim (ISU) July 5th, / 181
53 Kim, J. K. (2011), Parametric fractional imputation for missing data analysis, Biometrika 98, Kim, J. K. and C. L. Yu (2011), A semi-parametric estimation of mean functionals with non-ignorable missing data, Journal of the American Statistical Association 106, Kim, J. K., M. J. Brick, W. A. Fuller and G. Kalton (2006), On the bias of the multiple imputation variance estimator in survey sampling, Journal of the Royal Statistical Society: Series B 68, Kim, J. K. and M. K. Riddles (2012), Some theory for propensity-score-adjustment estimators in survey sampling, Survey Methodology 38, Kott, P. S. and T. Chang (2010), Using calibration weighting to adjust for nonignorable unit nonresponse, Journal of the American Statistical Association 105, Louis, T. A. (1982), Finding the observed information matrix when using the EM algorithm, Journal of the Royal Statistical Society: Series B 44, Meng, X. L. (1994), Multiple-imputation inferences with uncongenial sources of input (with discussion), Statistical Science 9, Jae-Kwang Kim (ISU) July 5th, / 181
54 Oakes, D. (1999), Direct calculation of the information matrix via the em algorithm, Journal of the Royal Statistical Society: Series B 61, Orchard, T. and M.A. Woodbury (1972), A missing information principle: theory and applications, in Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, University of California Press, Berkeley, California, pp Redner, R. A. and H. F. Walker (1984), Mixture densities, maximum likelihood and the EM algorithm, SIAM Review 26, Robins, J. M., A. Rotnitzky and L. P. Zhao (1994), Estimation of regression coefficients when some regressors are not always observed, Journal of the American Statistical Association 89, Robins, J. M. and N. Wang (2000), Inference for imputation estimators, Biometrika 87, Rubin, D. B. (1976), Inference and missing data, Biometrika 63, Tanner, M. A. and W. H. Wong (1987), The calculation of posterior distribution by data augmentation, Journal of the American Statistical Association 82, Jae-Kwang Kim (ISU) July 5th, / 181
55 Wang, N. and J. M. Robins (1998), Large-sample theory for parametric multiple imputation procedures, Biometrika 85, Wang, S., J. Shao and J. K. Kim (2014), Identifiability and estimation in problems with nonignorable nonresponse, Statistica Sinica 24, Wei, G. C. and M. A. Tanner (1990), A Monte Carlo implementation of the EM algorithm and the poor man s data augmentation algorithms, Journal of the American Statistical Association 85, Zhou, M. and J. K. Kim (2012), An efficient method of estimation for longitudinal surveys with monotone missing data, Biometrika 99,
56 Statistical Methods for Handling Missing Data Part 2: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Jae-Kwang Kim (ISU) Part 2 52 / 181
57 Introduction Basic setup Y: a vector of random variables with distribution F (y; θ). y 1,, y n are n independent realizations of Y. We are interested in estimating ψ which is implicitly defined by E{U(ψ; Y)} = 0. Under complete observation, a consistent estimator ˆψ n of ψ can be obtained by solving estimating equation for ψ: n U(ψ; y i ) = 0. A special case of estimating function is the score function. In this case, ψ = θ. Sandwich variance estimator is often used to estimate the variance of ˆψ n: where τ u = E{ U(ψ; y)/ ψ }. ˆV ( ˆψ n) = ˆτ u 1 ˆV (U)ˆτ u 1 Jae-Kwang Kim (ISU) Part 2 53 / 181
58 1. Introduction Missing data setup Suppose that y i is not fully observed. y i = (y obs,i, y mis,i ): (observed, missing) part of y i δ i : response indicator functions for y i. Under the existence of missing data, we can use the following estimators: ˆψ: solution to n E {U (ψ; y i ) y obs,i, δ i } = 0. (1) The equation in (1) is often called expected estimating equation. Jae-Kwang Kim (ISU) Part 2 54 / 181
59 1. Introduction Motivation (for imputation) Computing the conditional expectation in (1) can be a challenging problem. 1 The conditional expectation depends on unknown parameter values. That is, E {U (ψ; y i ) y obs,i, δ i } = E {U (ψ; y i ) y obs,i, δ i ; θ, φ}, where θ is the parameter in f (y; θ) and φ is the parameter in p(δ y; φ). 2 Even if we know η = (θ, φ), computing the conditional expectation is numerically difficult. Jae-Kwang Kim (ISU) Part 2 55 / 181
60 1. Introduction Imputation Imputation: Monte Carlo approximation of the conditional expectation (given the observed data). E {U (ψ; y i ) y obs,i, δ i } = 1 m m j=1 ( ) U ψ; y obs,i, y (j) mis,i 1 Bayesian approach: generate y mis,i from f (y mis,i y obs, δ) = f (y mis,i y obs, δ; η) p(η y obs, δ)dη 2 Frequentist approach: generate y mis,i from f (y mis,i y obs,i, δ; ˆη), where ˆη is a consistent estimator. Jae-Kwang Kim (ISU) Part 2 56 / 181
61 1. Introduction Imputation Questions 1 How to generate the Monte Carlo samples (or the imputed values)? 2 What is the asymptotic distribution of ˆψ I which solves 1 m m j=1 ( ) U ψ; y obs,i, y (j) mis,i = 0, where y (j) mis,i f (y mis,i y obs,i, δ; ˆη p) for some ˆη p? 3 How to estimate the variance of ˆψ I? Jae-Kwang Kim (ISU) Part 2 57 / 181
62 2. Basic Theory for Imputation Basic Setup (for Case 1: ψ = η) y = (y 1,, y n) f (y; θ) δ = (δ 1,, δ n) P(δ y; φ) y = (y obs, y mis ): (observed, missing) part of y. y (1) mis,, y (m) mis : m imputed values of y mis generated from f (y mis y obs, δ; ˆη p) = f (y; ˆθ p)p(δ y; ˆφ p) f (y; ˆθ p)p(δ y; ˆφ p)dµ(y mis), where ˆη p = (ˆθ p, ˆφ p) is a preliminary estimator of η = (θ, φ). Using m imputed values, imputed score function is computed as S imp,m (η ˆη p) 1 m m j=1 S com ( η; y obs, y (j) mis, δ ) where S com(η; y) is the score function of η = (θ, φ) under complete response. Jae-Kwang Kim (ISU) Part 2 58 / 181
63 2. Basic Theory for Imputation Lemma 1 (Asymptotic results for m = ) Assume that ˆη p converges in probability to η. Let ˆη I,m be the solution to 1 m m j=1 S com ( η; y obs, y (j) mis, δ ) = 0, where y (1) mis,, y (m) mis are the imputed values generated from f (y mis y obs, δ; ˆη p). Then, under some regularity conditions, for m, and ˆη imp, = ˆη MLE + J mis (ˆη p ˆη MLE) (2) V (ˆη imp, ). = I 1 obs + J mis {V (ˆη p) V (ˆη MLE)} J mis, where J mis = I 1 comi mis is the fraction of missing information. Jae-Kwang Kim (ISU) Part 2 59 / 181
64 2. Basic Theory for Imputation Remark For m =, the imputed score equation becomes the mean score equation. Equation (2) means that ˆη imp, = (I J mis) ˆη MLE + J mis ˆη p. (3) That is, ˆη imp, is a convex combination of ˆη MLE and ˆη p. Note that ˆη imp, is one-step EM update with initial estimate ˆη p. Let ˆη (t) be the t-th EM update of η that is computed by solving ( S η ˆη (t 1)) = 0 with ˆη (0) = ˆη p. Equation (3) implies that ˆη (t) = (I J mis) ˆη MLE + J mis ˆη (t 1). Thus, we can obtain ˆη (t) = ˆη MLE + (J mis) t 1 (ˆη (0) ˆη MLE ), which justifies lim t ˆη (t) = ˆη MLE. Jae-Kwang Kim (ISU) Part 2 60 / 181
65 2. Basic Theory for Imputation Theorem 1 (Asymptotic results for m < ) [Theorem 4.1 of KS] Let ˆη p be a preliminary n-consistent estimator of η with variance V p. Under some regularity conditions, the solution ˆη imp,m to S m (η ˆη p) 1 m m j=1 S com ( η; y obs, y (j) mis, δ ) = 0 has mean η 0 and asymptotic variance equal to V (ˆη imp,m ) { }. = I 1 obs + J mis V p I 1 obs J mis + m 1 IcomI 1 misicom 1 (4) where J mis = I 1 comi mis. This theorem was originally presented by Wang and Robins (1998). Jae-Kwang Kim (ISU) Part 2 61 / 181
66 2. Basic Theory for Imputation Remark If we use ˆη p = ˆη MLE, then the asymptotic variance in (4) is V (ˆη imp,m ). = I 1 obs + m 1 IcomI 1 misicom. 1 In Bayesian imputation (or multiple imputation), the posterior values of η are independently generated from η N(ˆη MLE, I 1 obs ), which implies that we can use V p = I 1 obs + m 1 I 1 obs. Thus, the asymptotic variance in (4) for multiple imputation is V (ˆη imp,m ). = I 1 obs + m 1 JmisI 1 obs J 1 mis + m 1 IcomI 1 misicom. 1 The second term is the additional price we pay when generating the posterior values, rather than using ˆη MLE directly. Jae-Kwang Kim (ISU) Part 2 62 / 181
67 2. Basic Theory for Imputation Basic Setup (for Case 2: ψ η) Parameter ψ defined by E{U(ψ; y)} = 0. Under complete response, a consistent estimator of ψ can be obtained by solving U (ψ; y) = 0. Assume that some part of y, denoted by y mis, is not observed and m imputed values, say y (1) mis,, y (m) mis, are generated from f (ymis y obs, δ; ˆη MLE ), where ˆη MLE is the MLE of η 0. The imputed estimating function using m imputed values is computed as Ū imp,m (ψ ˆη MLE ) = 1 m m U(ψ; y (j) ), (5) where y (j) = (y obs, y (j) mis ). Let ˆψ imp,m be the solution to Ū imp,m (ψ ˆη MLE ) = 0. We are interested in the asymptotic properties of ˆψ imp,m. j=1 Jae-Kwang Kim (ISU) Part 2 63 / 181
68 2. Basic Theory for Imputation Theorem 2 [Theorem 4.2 of KS] Suppose that the parameter of interest ψ 0 is estimated by solving U (ψ) = 0 under complete response. Then, under some regularity conditions, the solution to E {U (ψ) y obs, δ; ˆη MLE } = 0 (6) has mean ψ 0 and the asymptotic variance τ 1 Ωτ 1, where τ = E { U (ψ 0) / ψ } Ω = V { Ū (ψ 0 η 0) + κs obs (η 0) } and κ = E {U (ψ 0) S mis(η 0)} I 1 obs. This theorem was originally presented by Robins and Wang (2000). Jae-Kwang Kim (ISU) Part 2 64 / 181
69 2. Basic Theory for Imputation Remark Writing Ū(ψ η) E{U(ψ) y obs, δ; η}, the solution to (6) can be treated as the solution to the joint estimating equation [ ] U1(ψ, η) U (ψ, η) = 0, U 2(η) where U 1(ψ, η) = Ū (ψ η) and U2(η) = S obs (η). We can apply the Taylor expansion to get ( ) ( ) ( ) 1 [ ˆψ ψ0 B11 B 12 U1(ψ 0, η 0) = ˆη η 0 B 21 B 22 U 2(η 0) ] where ( B11 B 12 Thus, as B 21 = 0, B 21 B 22 ) = [ E ( U1/ ψ ) E ( U 1/ η ) E ( U 2/ ψ ) E ( U 2/ η ) ˆψ { } = ψ 0 B 1 11 U 1(ψ 0, η 0) B 12B 1 22 U 2(η 0). In Theorem 2, B 11 = τ, B 12 = E(US mis ), and B 22 = I obs. Jae-Kwang Kim (ISU) Part 2 65 / 181 ].
70 3. Monte Carlo EM Motivation: Monte Carlo samples in the EM algorithm can be used as imputed values. Monte Carlo EM 1 In the EM algorithm defined by [E-step] Compute ( Q η η (t)) = E {ln f (y, δ; η) y obs, δ; η (t)} [M-step] Find η (t+1) that maximizes Q ( η η (t)), E-step is computationally cumbersome because it involves integral. 2 Wei and Tanner (1990): In the E-step, first draw y (1) mis,, y (m) mis f (y mis y obs, δ; η (t)) and approximate ( Q η η (t)) 1 = m m ln f j=1 ( y obs, y (j) mis, δ; η ). Jae-Kwang Kim (ISU) Part 2 66 / 181
71 3. Monte Carlo EM Example 1 [Example 3.15 of KS] Suppose that y i f (y i x i ; θ) Assume that x i is always observed but we observe y i only when δ i = 1 where δ i Bernoulli [π i (φ)] and π i (φ) = exp (φ0 + φ1x i + φ 2y i ) 1 + exp (φ 0 + φ 1x i + φ 2y i ). To implement the MCEM method, in the E-step, we need to generate samples from f (y i x i, δ i = 0; ˆθ, ˆφ) = f (y i x i ; ˆθ){1 π i ( ˆφ)} f (yi x i ; ˆθ){1 π i ( ˆφ)}dy i. Jae-Kwang Kim (ISU) Part 2 67 / 181
72 3. Monte Carlo EM Example 1 (Cont d) We can use the following rejection method to generate samples from f (y i x i, δ i = 0; ˆθ, ˆφ): 1 Generate yi from f (y i x i ; ˆθ). 2 Using yi, compute π i ( ˆφ) = Accept yi with probability 1 πi ( ˆφ). 3 If yi is not accepted, then goto Step 1. exp( ˆφ 0 + ˆφ 1 x i + ˆφ 2 y i ) 1 + exp( ˆφ 0 + ˆφ 1 x i + ˆφ 2 y i ). Jae-Kwang Kim (ISU) Part 2 68 / 181
73 3. Monte Carlo EM Example 1 (Cont d) Using the m imputed values of y i, denoted by y (1) i,, y (m) i, and the M-step can be implemented by solving and n j=1 n m ( ) S θ; x i, y (j) i = 0 j=1 where S (θ; x i, y i ) = log f (y i x i ; θ)/ θ. m { } ( ) δ i π(φ; x i, y (j) i ) 1, x i, y (j) i = 0, Jae-Kwang Kim (ISU) Part 2 69 / 181
74 3. Monte Carlo EM Example 2 (GLMM) [Example 3.18] Basic Setup: Let y ij be a binary random variable (that takes 0 or 1) with probability p ij = Pr (y ij = 1 x ij, a i ) and we assume that logit (p ij ) = x ijβ + a i where x ij is a p-dimensional covariate associate with j-th repetition of unit i, β is the parameter of interest that can represent the treatment effect due to x, and a i represents the random effect associate with unit i. We assume that a i are iid with N ( 0, σ 2). Missing data : a i Observed likelihood: L obs ( β, σ 2) = i { j p (x ij, a i ; β) y ij [1 p (x ij, a i ; β)] 1 y 1 ( ij σ φ ai ) } da i σ where φ ( ) is the pdf of the standard normal distribution. Jae-Kwang Kim (ISU) Part 2 70 / 181
75 3. Monte Carlo EM Example 2 (Cont d) MCEM approach: generate a i from Metropolis-Hastings algorithm: 1 Generate ai from f 2 (a i ; ˆσ). 2 Set { where ( ρ f (a i x i, y i ; ˆβ, ˆσ) f 1(y i x i, a i ; ˆβ)f 2(a i ; ˆσ). a (t) i = a (t 1) i ai a (t 1) i w.p. ρ(a (t 1) i, ai ) w.p. 1 ρ(a (t 1) i, ai ) ) f 1 (y i x i, a, ai i ; ˆβ ) = min ( ), 1 f 1 y i x i, a (t 1) ; ˆβ. i Jae-Kwang Kim (ISU) Part 2 71 / 181
76 3. Monte Carlo EM Remark Monte Carlo EM can be used as a frequentist approach to imputation. Convergence is not guaranteed (for fixed m). E-step can be computationally heavy. (May use MCMC method). Jae-Kwang Kim (ISU) Part 2 72 / 181
77 4. Parametric Fractional Imputation Parametric fractional imputation 1 More than one (say m) imputed values of y mis,i : y (1) mis,i,, y (m) mis,i from some (initial) density h (y mis,i ). 2 Create weighted data set {( w ij, yij ) } ; j = 1, 2,, m; i = 1, 2, n where m j=1 w ij = 1, yij = (y obs,i, y (j) mis,i ) w ij f (y ij, δ i ; ˆη)/h(y (j) mis,i ), ˆη is the maximum likelihood estimator of η, and f (y, δ; η) is the joint density of (y, δ). 3 The weight wij weights. are the normalized importance weights and can be called fractional If y mis,i is categorical, then simply use all possible values of y mis,i as the imputed values and then assign their conditional probabilities as the fractional weights. Jae-Kwang Kim (ISU) Part 2 73 / 181
78 4. Parametric Fractional Imputation Remark Importance sampling idea: For sufficiently large m, m j=1 ij g ( yij ) g(yi ) f (y i,δ i ; ˆη) = f (yi,δ i ; ˆη) h(y h(y mis,i ) mis,i)dy mis,i w h(y mis,i ) h(y mis,i)dy mis,i = E {g (y i ) y obs,i, δ i ; ˆη} for any g such that the expectation exists. In the importance sampling literature, h( ) is called proposal distribution and f ( ) is called target distribution. Do not need to compute the conditional distribution f (y mis,i y obs,i, δ i ; η). Only the joint distribution f (y obs,i, y mis,i, δ i ; η) is needed because f (y obs,i, y (j) mis,i, δ i; ˆη)/h(y (j) i,mis ) m k=1 f (y obs,i, y (k) mis,i, δ i; ˆη)/h(y (k) i,mis ) = f (y m k=1 (j) mis,i y obs,i, δ i ; ˆη)/h(y (j) i,mis ) (k) f (ymis,i y obs,i, δ i ; ˆη)/h(y (k) i,mis ). Jae-Kwang Kim (ISU) Part 2 74 / 181
79 4. Parametric Fractional Imputation EM algorithm by fractional imputation 1 Imputation-step: generate y (j) mis,i h (y i,mis ). 2 Weighting-step: compute where m j=1 w ij(t) = 1. 3 M-step: update w ij(t) f (y ij, δ i ; ˆη (t) )/h(y (j) i,mis ) ˆη (t+1) : solution to n m j=1 w ij(t)s ( η; y ij, δ i ) = 0. 4 Repeat Step2 and Step 3 until convergence. Imputation Step + Weighting Step = E-step. We may add an optional step that checks if w ij(t) is too large for some j. In this case, h(y i,mis ) needs to be changed. Jae-Kwang Kim (ISU) Part 2 75 / 181
80 4. Parametric Fractional imputation The imputed values are not changed for each EM iteration. Only the fractional weights are changed. 1 Computationally efficient (because we use importance sampling only once). 2 Convergence is achieved (because the imputed values are not changed). For sufficiently large t, ˆη (t) ˆη. Also, for sufficiently large m, ˆη ˆη MLE. For estimation of ψ in E{U(ψ; Y )} = 0, simply use 1 n n m j=1 w ij U(ψ; y ij ) = 0. Linearization variance estimation (using the result of Theorem 2) is discussed in Kim (2011). Jae-Kwang Kim (ISU) Part 2 76 / 181
81 4. Parametric Fractional imputation Return to Example 1 Fractional imputation 1 Imputation Step: Generate y (1) i,, y (m) i from f (y i x i ; ˆθ ) (0). 2 Weighting Step: Using the m imputed values generated from Step 1, compute the fractional weights by ( f y (j) wij(t) i x i ; ˆθ ) (t) { ( f y (j) i x i ; ˆθ ) 1 π(x i, y (j) i ; ˆφ } (t) ) (0) where ( π(x i, y i ; ˆφ) exp ˆφ0 + ˆφ 1 x i + ˆφ ) 2 y i = ( 1 + exp ˆφ0 + ˆφ 1 x i + ˆφ ). 2 y i Jae-Kwang Kim (ISU) Part 2 77 / 181
82 4. Parametric Fractional imputation Return to Example 1 (Cont d) Using the imputed data and the fractional weights, the M-step can be implemented by solving and n j=1 n m ( ) wij(t)s θ; x i, y (j) i = 0 j=1 m { } ( ) w ij(t) δ i π(φ; x i, y (j) i ) 1, x i, y (j) i = 0, (7) where S (θ; x i, y i ) = log f (y i x i ; θ)/ θ. Jae-Kwang Kim (ISU) Part 2 78 / 181
83 4. Parametric Fractional imputation Example 3: Categorical missing data Original data i Y 1i Y 2i X i ? 4 3? ?? 8 Jae-Kwang Kim (ISU) Part 2 79 / 181
84 4. Parametric Fractional imputation Example 3 (Con d) Y 1 and Y 2 are dichotomous and X is continuous. Model Pr (Y 1 = 1 X ) = Λ (α 0 + α 1X ) Pr (Y 2 = 1 X, Y 1) = Λ (β 0 + β 1X + β 2Y 1) where Λ (x) = {1 + exp ( x)} 1 Assume MAR Jae-Kwang Kim (ISU) Part 2 80 / 181
85 4. Parametric Fractional imputation Example 3 (Con d) Imputed data i Fractional Weight Y 1i Y 2i X i Pr (Y 2 = 0 Y 1 = 1, X = 4) Pr (Y 2 = 1 Y 1 = 1, X = 4) Pr (Y 1 = 0 Y 2 = 0, X = 5) Pr (Y 1 = 1 Y 2 = 0, X = 5) Pr (Y 1 = 0, Y 1 = 0 X = 8) Pr (Y 1 = 0, Y 1 = 1 X = 8) Pr (Y 1 = 1, Y 1 = 0 X = 8) Pr (Y 1 = 1, Y 1 = 1 X = 8) Jae-Kwang Kim (ISU) Part 2 81 / 181
86 4. Parametric Fractional imputation Example 3 (Con d) Implementation of EM algorithm using fractional imputation E-step: compute the mean score functions using the fractional weights. M-step: solve the mean score function. Because we have a completed data with weights, we can estimate other parameters such as θ = Pr (Y 2 = 1 X > 5). Jae-Kwang Kim (ISU) Part 2 82 / 181
87 4. Parametric Fractional imputation Example 4: Measurement error model [Example 4.13 of KS] Interested in estimating θ in f (y x; θ). Instead of observing x, we observe z which can be highly correlated with x. Thus, z is an instrumental variable for x: and for a b. f (y x, z) = f (y x) f (y z = a) f (y z = b) In addition to original sample, we have a separate calibration sample that observes (x i, z i ). Jae-Kwang Kim (ISU) Part 2 83 / 181
88 4. Parametric Fractional imputation Table : Data Structure Z X Y Calibration Sample o o Original Sample o o Jae-Kwang Kim (ISU) Part 2 84 / 181
89 4. Parametric Fractional imputation Example 4 (Cont d) The goal is to generate x in the original sample from f (x i z i, y i ) f (x i z i ) f (y i x i, z i ) = f (x i z i ) f (y i x i ) Obtain a consistent estimator ˆf (x z) from calibration sample. E-step 1 Generate x (1) i,, x (m) i from ˆf (x i z i ). 2 Compute the fractional weights associated with x (j) i w ij f (y i x (j) ; ˆθ) M-step: Solve the weighted score equation for θ. i by Jae-Kwang Kim (ISU) Part 2 85 / 181
90 5. Multiple imputation Features 1 Imputed values are generated from y (j) i,mis f (y i,mis y i,obs, δ i ; η ) where η is generated from the posterior distribution π (η y obs ). 2 Variance estimation formula is simple (Rubin s formula). ˆV MI ( ψ m) = W m + (1 + 1 m )Bm where W M = m 1 m k=1 ˆV I (k), B m = (m 1) 1 m k=1 ( ˆψ(k) ψ m ) 2, ψ m = m 1 m k=1 ˆψ (k) is the average of m imputed estimators, and ˆV I (k) is the imputed version of the variance estimator of ˆψ under complete response. Jae-Kwang Kim (ISU) Part 2 86 / 181
91 5. Multiple imputation The computation for Bayesian imputation can be implemented by the data augmentation (Tanner and Wong, 1987) technique, which is a special application of the Gibb s sampling method: 1 I-step: Generate y mis f (y mis y obs, δ; η ) 2 P-step: Generate η π (η y obs, y mis, δ) Needs some tools for checking the convergence to a stable distribution. Consistency of variance estimator is questionable (Kim et al., 2006). Jae-Kwang Kim (ISU) Part 2 87 / 181
92 6. Simulation Study Simulation 1 Bivariate data (x i, y i ) of size n = 200 with x i N(3, 1) y i N( 2 + x i, 1) x i always observed, y i subject to missingness. MCAR (δ Bernoulli(0.6)) Parameters of interest 1 θ 1 = E(Y ) 2 θ 2 = Pr(Y < 1) Multiple imputation (MI) and fractional imputation (FI) are applied with m = 50. For estimation of θ 2, the following method-of-moment estimator is used. ˆθ 2,MME = n 1 n I (y i < 1) Jae-Kwang Kim (ISU) Part 2 88 / 181
93 6. Simulation Study Table 1 Monte Carlo bias and variance of the point estimators. Parameter Estimator Bias Variance Std Var Complete sample θ 1 MI FI Complete sample θ 2 MI FI Table 2 Monte Carlo relative bias of the variance estimator. Parameter Imputation Relative bias (%) V (ˆθ 1) MI FI 1.21 V (ˆθ 2) MI FI 2.05 Jae-Kwang Kim (ISU) Part 2 89 / 181
94 6. Simulation study Rubin s formula is based on the following decomposition: V (ˆθ MI ) = V (ˆθ n) + V (ˆθ MI ˆθ n) where ˆθ n is the complete-sample estimator of θ. Basically, W m term estimates V (ˆθ n) and (1 + m 1 )B m term estimates V (ˆθ MI ˆθ n). For general case, we have V (ˆθ MI ) = V (ˆθ n) + V (ˆθ MI ˆθ n) + 2Cov(ˆθ MI ˆθ n, ˆθ n) and Rubin s variance estimator ignores the covariance term. Thus, a sufficient condition for the validity of unbiased variance estimator is Cov(ˆθ MI ˆθ n, ˆθ n) = 0. Meng (1994) called the condition congeniality of ˆθ n. Congeniality holds when ˆθ n is the MLE of θ. Jae-Kwang Kim (ISU) Part 2 90 / 181
95 6. Simulation study For example, there are two estimators of θ = P(Y < 1) when Y follows from N(µ, σ 2 ). 1 Maximum likelihood method: ˆθ MLE = 1 φ(z; ˆµ, ˆσ2 )dz 2 Method of moments: ˆθ MME = n 1 n I (y i < 1) In the simulation setup, the imputed estimator of θ 2 can be expressed as n ˆθ 2,I = n 1 [δ i I (y i < 1) + (1 δ i )E{I (y i < 1) x i ; ˆµ, ˆσ}]. Thus, imputed estimator of θ 2 borrows strength by making use of extra information associated with f (y x). Thus, when the congeniality conditions does not hold, the imputed estimator improves the efficiency (due to the imputation model that uses extra information) but the variance estimator does not recognize this improvement. Jae-Kwang Kim (ISU) Part 2 91 / 181
96 6. Simulation Study Simulation 2 Bivariate data (x i, y i ) of size n = 100 with ( ) Y i = β 0 + β 1x i + β 2 xi e i (8) where (β 0, β 1, β 2) = (0, 0.9, 0.06), x i N (0, 1), e i N (0, 0.16), and x i and e i are independent. The variable x i is always observed but the probability that y i responds is 0.5. In MI, the imputer s model is Y i = β 0 + β 1x i + e i. That is, imputer s model uses extra information of β 2 = 0. From the imputed data, we fit model (8) and computed power of a test H 0 : β 2 = 0 with 0.05 significant level. In addition, we also considered the Complete-Case (CC) method that simply uses the complete cases only for the regression analysis Jae-Kwang Kim (ISU) Part 2 92 / 181
97 6. Simulation Study Table 3 Simulation results for the Monte Carlo experiment based on 10,000 Monte Carlo samples. Method E(ˆθ) V (ˆθ) R.B. ( ˆV ) Power MI FI CC Table 3 shows that MI provides efficient point estimator than CC method but variance estimation is very conservative (more than 100% overestimation). Because of the serious positive bias of MI variance estimator, the statistical power of the test based on MI is actually lower than the CC method. Jae-Kwang Kim (ISU) Part 2 93 / 181
98 7. Summary Imputation can be viewed as a Monte Carlo tool for computing the conditional expectation. Monte Carlo EM is very popular but the E-step can be computationally heavy. Parametric fractional imputation is a useful tool for frequentist imputation. Multiple imputation is motivated from a Bayesian framework. The frequentist validity of multiple imputation requires the condition of congeniality. Uncongeniality may lead to overestimation of variance which can seriously increase type-2 errors. Jae-Kwang Kim (ISU) Part 2 94 / 181
99 REFERENCES Cheng, P. E. (1994), Nonparametric estimation of mean functionals with data missing at random, Journal of the American Statistical Association 89, Dempster, A. P., N. M. Laird and D. B. Rubin (1977), Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B 39, Fisher, R. A. (1922), On the mathematical foundations of theoretical statistics, Philosophical Transactions of the Royal Society of London A 222, Fuller, W. A., M. M. Loughin and H. D. Baker (1994), Regression weighting in the presence of nonresponse with application to the Nationwide Food Consumption Survey, Survey Methodology 20, Hirano, K., G. Imbens and G. Ridder (2003), Efficient estimation of average treatment effects using the estimated propensity score, Econometrica 71, Ibrahim, J. G. (1990), Incomplete data in generalized linear models, Journal of the American Statistical Association 85, Kim, J. K. (2011), Parametric fractional imputation for missing data analysis, Biometrika 98, Kim, J. K. and C. L. Yu (2011), A semi-parametric estimation of mean functionals with non-ignorable missing data, Journal of the American Statistical Association 106, Jae-Kwang Kim (ISU) Part 2 94 / 181
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationChapter 4: Imputation
Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationRecent Advances in the analysis of missing data with non-ignorable missingness
Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More information6. Fractional Imputation in Survey Sampling
6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationNonresponse weighting adjustment using estimated response probability
Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationMiscellanea A note on multiple imputation under complex sampling
Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.
More informationANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW
SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved
More informationanalysis of incomplete data in statistical surveys
analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationDiscussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs
Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully
More informationTwo-phase sampling approach to fractional hot deck imputation
Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.
More informationComparison between conditional and marginal maximum likelihood for a class of item response models
(1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationChapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70
Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:
More informationOn the bias of the multiple-imputation variance estimator in survey sampling
J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,
More informationCalibration Estimation for Semiparametric Copula Models under Missing Data
Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre
More informationA weighted simulation-based estimator for incomplete longitudinal data models
To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun
More informationMissing Data: Theory & Methods. Introduction
STAT 992/BMI 826 University of Wisconsin-Madison Missing Data: Theory & Methods Chapter 1 Introduction Lu Mao lmao@biostat.wisc.edu 1-1 Introduction Statistical analysis with missing data is a rich and
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationStatistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23
1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationAN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE
Statistica Sinica 24 (2014), 1097-1116 doi:http://dx.doi.org/10.5705/ss.2012.074 AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Sheng Wang 1, Jun Shao
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationPropensity score adjusted method for missing data
Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd
More informationBasics of Modern Missing Data Analysis
Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationarxiv:math/ v1 [math.st] 23 Jun 2004
The Annals of Statistics 2004, Vol. 32, No. 2, 766 783 DOI: 10.1214/009053604000000175 c Institute of Mathematical Statistics, 2004 arxiv:math/0406453v1 [math.st] 23 Jun 2004 FINITE SAMPLE PROPERTIES OF
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationFractional imputation method of handling missing data and spatial statistics
Graduate Theses and Dissertations Graduate College 2014 Fractional imputation method of handling missing data and spatial statistics Shu Yang Iowa State University Follow this and additional works at:
More informationModification and Improvement of Empirical Likelihood for Missing Response Problem
UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu
More informationStreamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level
Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationIntroduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationData Integration for Big Data Analysis for finite population inference
for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation
More informationA Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,
A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationA measurement error model approach to small area estimation
A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion
More informationIntroduction to Survey Data Integration
Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationSTATISTICAL INFERENCE WITH DATA AUGMENTATION AND PARAMETER EXPANSION
STATISTICAL INFERENCE WITH arxiv:1512.00847v1 [math.st] 2 Dec 2015 DATA AUGMENTATION AND PARAMETER EXPANSION Yannis G. Yatracos Faculty of Communication and Media Studies Cyprus University of Technology
More informationCombining data from two independent surveys: model-assisted approach
Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,
More informationEstimation, Inference, and Hypothesis Testing
Chapter 2 Estimation, Inference, and Hypothesis Testing Note: The primary reference for these notes is Ch. 7 and 8 of Casella & Berger 2. This text may be challenging if new to this topic and Ch. 7 of
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationBayesian Modeling of Conditional Distributions
Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings
More informationCovariate Balancing Propensity Score for General Treatment Regimes
Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationThe propensity score with continuous treatments
7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationModeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses
Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association
More informationWeighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai
Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving
More informationGraybill Conference Poster Session Introductions
Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationInferences on missing information under multiple imputation and two-stage multiple imputation
p. 1/4 Inferences on missing information under multiple imputation and two-stage multiple imputation Ofer Harel Department of Statistics University of Connecticut Prepared for the Missing Data Approaches
More informationEstimation of the Conditional Variance in Paired Experiments
Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often
More informationEstimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1
Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationLikelihood inference in the presence of nuisance parameters
Likelihood inference in the presence of nuisance parameters Nancy Reid, University of Toronto www.utstat.utoronto.ca/reid/research 1. Notation, Fisher information, orthogonal parameters 2. Likelihood inference
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationPlausible Values for Latent Variables Using Mplus
Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationLikelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona
Likelihood based Statistical Inference Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona L. Pace, A. Salvan, N. Sartori Udine, April 2008 Likelihood: observed quantities,
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationSome methods for handling missing values in outcome variables. Roderick J. Little
Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean
More informationReconstruction of individual patient data for meta analysis via Bayesian approach
Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi
More information1. Fisher Information
1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationApplied Asymptotics Case studies in higher order inference
Applied Asymptotics Case studies in higher order inference Nancy Reid May 18, 2006 A.C. Davison, A. R. Brazzale, A. M. Staicu Introduction likelihood-based inference in parametric models higher order approximations
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationFor iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.
Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of
More informationThe consequences of misspecifying the random effects distribution when fitting generalized linear mixed models
The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San
More informationBayesian inference for multivariate extreme value distributions
Bayesian inference for multivariate extreme value distributions Sebastian Engelke Clément Dombry, Marco Oesting Toronto, Fields Institute, May 4th, 2016 Main motivation For a parametric model Z F θ of
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationInformation in Data. Sufficiency, Ancillarity, Minimality, and Completeness
Information in Data Sufficiency, Ancillarity, Minimality, and Completeness Important properties of statistics that determine the usefulness of those statistics in statistical inference. These general properties
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More information