Recent Advances in the analysis of missing data with non-ignorable missingness

Size: px

Start display at page:

Download "Recent Advances in the analysis of missing data with non-ignorable missingness"

Camron Morrison
5 years ago
Views:

1 Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014

2 1 Introduction 2 Full likelihood-based ML estimation 3 GMM method 4 Partial Likelihood approach 5 Exponential tilting method 6 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

3 Observed likelihood (X, Y ): random variable, y is subject to missingness f (y x; θ): model of y on x g(δ x, y; φ): model of δ on (x, y) Observed likelihood L obs (θ, φ) = f (y i x i ; θ) g (δ i x i, y i ; φ) δ i =1 δ i =0 f (y i x i ; θ) g (δ i x i, y i ; φ) dy i Under what conditions the parameters are identifiable (or estimable)? Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

4 Lemma Suppose that we can decompose the covariate vector x into two parts, u and z, such that g(δ y, x) = g(δ y, u) (1) and, for any given u, there exist z u,1 and z u,2 such that f (y u, z = z u,1) f (y u, z = z u,2). (2) Under some other minor conditions, all the parameters in f and g are identifiable. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

5 Remark Condition (1) means δ z y, u. That is, given (y, u), z does not help in explaining δ. Thus, z plays the role of instrumental variable in econometrics: f (y x, z ) = f (y x ), Cov(z, x ) 0. Here, y = δ, x = (y, u), and z = z. We may call z the nonresponse instrument variable. Rigorous theory developed by Wang et al. (2014). Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

6 Introduction Remark MCAR (Missing Completely at random): P(δ y) does not depend on y. MAR (Missing at random): P(δ y) = P(δ y obs ) NMAR (Not Missing at random): P(δ y) P(δ y obs ) Thus, MCAR is a special case of MAR. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

7 Parameter estimation under the existence of nonresponse instrument variable Full likelihood-based ML estimation Generalized method of moment (GMM) approach (Section 6.3 of KS) Conditional likelihood approach (Section 6.2 of KS) Pseudo likelihood approach (Section 6.4 of KS) Exponential tilting method (Section 6.5 of KS) Latent variable approach (Section 6.6 of KS) Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

8 1 Introduction 2 Full likelihood-based ML estimation 3 GMM method 4 Partial Likelihood approach 5 Exponential tilting method 6 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

9 Full likelihood-based ML estimation Wish to find ˆη = (ˆθ, ˆφ), that maximizes the observed likelihood L obs (η) = f (y i x i ; θ) g (δ i x i, y i ; φ) δ i =1 f (y i x i ; θ) g (δ i x i, y i ; φ) dy i δ i =0 Mean score theorem: Under some regularity conditions, finding the MLE of maximizing the observed likelihood is equivalent to finding the solution to S(η) E{S(η) y obs, δ; η} = 0, where y obs is the observed data. The conditional expectation of the score function is called mean score function. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

10 Example 1 [Example 2.5 of KS] 1 Suppose that the study variable y is randomly distributed with Bernoulli distribution with probability of success p i, where p i = p i (β) = exp (x i β) 1 + exp (x i β) for some unknown parameter β and x i is a vector of the covariates in the logistic regression model for y i. We assume that 1 is in the column space of x i. 2 Under complete response, the score function for β is S 1(β) = n (y i p i (β)) x i. i=1 Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

11 Example 1 (Cont d) 3 Let δ i be the response indicator function for y i with distribution Bernoulli(π i ) where π i = exp (x i φ 0 + y i φ 1) 1 + exp (x i φ0 + y iφ 1). We assume that x i is always observed, but y i is missing if δ i = 0. 4 Under missing data, the mean score function for β, the conditional expectation of the original score function given the observed data, is S 1 (β, φ) = δi =1 {y i p i (β)} x i + δ i =0 1 w i (y; β, φ) {y p i (β)} x i, where w i (y; β, φ) is the conditional probability of y i = y given x i and δ i = 0: y=0 w i (y; β, φ) = P β (y i = y x i ) P φ (δ i = 0 y i = y, x i ) 1 z=0 P β (y i = z x i ) P φ (δ i = 0 y i = z, x i ) Thus, S 1 (β, φ) is also a function of φ. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

12 Example 1 (Cont d) 5 If the response mechanism is MAR so that φ 1 = 0, then w i (y; β, φ) = P β (y i = y x i ) 1 z=0 P β (y i = z x i ) = P β (y i = y x i ) and so S 1 (β, φ) = δi =1 {y i p i (β)} x i = S 1 (β). 6 If MAR does not hold, then ( ˆβ, ˆφ) can be obtained by solving S 1 (β, φ) = 0 and S 2(β, φ) = 0 jointly, where S 2 (β, φ) = δi =1 {δ i π (φ; x i, y i )} (x i, y i ) + δ i =0 1 w i (y; β, φ) {δ i π i (φ; x i, y)} (x i, y). y=0 Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

13 EM algorithm Interested in finding ˆη that maximizes L obs (η). The MLE can be obtained by solving S obs (η) = 0, which is equivalent to solving S(η) = 0 by the mean score theorem. EM algorithm provides an alternative method of solving S(η) = 0 by writing S(η) = E {S com(η) y obs, δ; η} and using the following iterative method: { ˆη (t+1) solve E S com(η) y obs, δ; ˆη (t)} = 0. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

14 5. EM algorithm Definition Let η (t) be the current value of the parameter estimate of η. The EM algorithm can be defined as iteratively carrying out the following E-step and M-steps: E-step: Compute ( Q η η (t)) = E {ln f (y, δ; η) y obs, δ, η (t)} M-step: Find η (t+1) that maximizes Q(η η (t) ) w.r.t. η. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

15 EM algorithm Example 1 (Cont d) E-step: where ( S 1 β β (t), φ (t)) = {y i p i (β)} x i + δ i =1 δ i =0 w ij(t) = Pr(Y i = j x i, δ i = 0; β (t), φ (t) ) = 1 w ij(t) {j p i (β)} x i, j=0 Pr(Y i = j x i ; β (t) )Pr(δ i = 0 x i, j; φ (t) ) 1 y=0 Pr(Y i = y x i ; β (t) )Pr(δ i = 0 x i, y; φ (t) ) and ( S 2 φ β (t), φ (t)) = {δ i π (x i, y i ; φ)} ( x ) i, y i δ i =1 + δ i =0 1 w ij(t) {δ i π i (x i, j; φ)} ( x i, j ). j=0 Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

16 EM algorithm Example 1 (Cont d) M-step: The parameter estimates are updated by solving [ S1 (β β (t), φ (t)), S 2 ( φ β (t), φ (t))] = (0, 0) for β and φ. For categorical missing data, the conditional expectation in the E-step can be computed using the weighted mean with weights w ij(t). Ibrahim (1990) called this method EM by weighting. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

17 Monte Carlo EM Motivation: Monte Carlo samples in the EM algorithm can be used as imputed values. Monte Carlo EM 1 In the EM algorithm defined by [E-step] Compute ( Q η η (t)) = E {ln f (y, δ; η) y obs, δ; η (t)} [M-step] Find η (t+1) that maximizes Q ( η η (t)), E-step is computationally cumbersome because it involves integral. 2 Wei and Tanner (1990): In the E-step, first draw y (1) mis,, y (m) mis f (y mis y obs, δ; η (t)) and approximate ( Q η η (t)) 1 = m m ln f j=1 ( y obs, y (j) mis, δ; η ). Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

18 Monte Carlo EM Example 2 [Example 3.15 of KS] Suppose that y i f (y i x i ; θ) Assume that x i is always observed but we observe y i only when δ i = 1 where δ i Bernoulli [π i (φ)] and π i (φ) = exp (φ0 + φ1x i + φ 2y i ) 1 + exp (φ 0 + φ 1x i + φ 2y i ). To implement the MCEM method, in the E-step, we need to generate samples from f (y i x i, δ i = 0; ˆθ, ˆφ) = f (y i x i ; ˆθ){1 π i ( ˆφ)} f (yi x i ; ˆθ){1 π i ( ˆφ)}dy i. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

19 Monte Carlo EM Example 2 (Cont d) We can use the following rejection method to generate samples from f (y i x i, δ i = 0; ˆθ, ˆφ): 1 Generate yi from f (y i x i ; ˆθ). 2 Using yi, compute π i ( ˆφ) = Accept yi with probability 1 πi ( ˆφ). 3 If yi is not accepted, then goto Step 1. exp( ˆφ 0 + ˆφ 1 x i + ˆφ 2 y i ) 1 + exp( ˆφ 0 + ˆφ 1 x i + ˆφ 2 y i ). Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

20 Monte Carlo EM Example 2 (Cont d) Using the m imputed values of y i, denoted by y (1) i,, y (m) i, and the M-step can be implemented by solving and n i=1 j=1 n i=1 m ( ) S θ; x i, y (j) i = 0 j=1 where S (θ; x i, y i ) = log f (y i x i ; θ)/ θ. m { } ( ) δ i π(φ; x i, y (j) i ) 1, x i, y (j) i = 0, Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

21 1 Introduction 2 Full likelihood-based ML estimation 3 GMM method 4 Partial Likelihood approach 5 Exponential tilting method 6 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

22 Basic setup (X, Y ): random variable θ: Defined by solving E{U(θ; X, Y )} = 0. y i is subject to missingness δ i = { 1 if yi responds 0 if y i is missing. Want to find w i such that the solution ˆθ w to is consistent for θ. n δ i w i U(θ; x i, y i ) = 0 i=1 Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

23 Basic Setup Result 1: The choice of w i = makes the resulting estimator ˆθ w consistent. 1 E(δ i x i, y i ) Result 2: If δ i Bernoulli(π i ), then using w i = 1/π i also makes the resulting estimator consistent, but it is less efficient than ˆθ w using w i in (3). (3) Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

24 Parameter estimation : GMM method Because z is a nonresponse instrumental variable, we may assume for some (φ 0, φ 1, φ 2). P(δ = 1 x, y) = π(φ 0 + φ 1u + φ 2y) Kott and Chang (2010): Construct a set of estimating equations such as n { } δ i π(φ 0 + φ 1u i + φ 2y i ) 1 (1, u i, z i ) = 0 i=1 that are unbiased to zero. May have overidentified situation: Use the generalized method of moments (GMM). Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

25 GMM method Example 3 Suppose that we are interested in estimating the parameters in the regression model y i = β 0 + β 1x 1i + β 2x 2i + e i (4) where E(e i x i ) = 0. Assume that y i is subject to missingness and assume that P(δ i = 1 x 1i, x i2, y i ) = exp(φ0 + φ1x 1i + φ 2y i ) 1 + exp(φ 0 + φ 1x 1i + φ 2y i ). Thus, x 2i is the nonresponse instrument variable in this setup. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

26 Example 3 (Cont d) A consistent estimator of φ can be obtained by solving n { } δ Û 2(φ) π(φ; x 1i, y i ) 1 (1, x 1i, x 2i ) = (0, 0, 0). (5) i=1 Roughly speaking, the solution to (5) exists almost surely if E{ Û2(φ)/ φ} is of full rank in the neighborhood of the true value of φ. If x 2 is vector, then (5) is overidentified and the solution to (5) does not exist. In the case, the GMM algorithm can be used. Finding the solution to Û2(φ) = 0 can be obtained by finding the minimizer of Q(φ) = Û2(φ) Û 2(φ) or Q W (φ) = Û2(φ) W Û2(φ) where W = {V (Û2)} 1. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

27 Example 3 (Cont d) Once the solution ˆφ to (5) is obtained, then a consistent estimator of β = (β 0, β 1, β 2) can be obtained by solving for β. Û 1(β, ˆφ) n i=1 δ i ˆπ i {y i β 0 β 1x 1i β 2x 2i } (1, x 1i, x 2i ) = (0, 0, 0) (6) Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

28 Asymptotic Properties The asymptotic variance of the GMM estimator ˆφ that minimizes Û 2(φ) ˆΣ 1 Û 2(φ) is V ( ˆφ) ( ) 1 = Γ Σ 1 Γ where The variance is estimated by Γ = E{ Û2(φ)/ φ} Σ = V (Û2). (ˆΓ ˆΣ 1ˆΓ) 1, where ˆΓ = Û/ φ evaluated at ˆφ and ˆΣ is an estimated variance-covariance matrix of Û2(φ) evaluated at ˆφ. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

29 Asymptotic Properties The asymptotic variance of ˆβ obtained from (6) with ˆφ computed from the GMM can be obtained by V (ˆθ) ( ) 1 = Γ aσ 1 a Γ a where and θ = (β, φ). Γ a = E{ Û(θ)/ θ} Σ a = V (Û) Û = (Û 1, Û 2) Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

30 1 Introduction 2 Full likelihood-based ML estimation 3 GMM method 4 Partial Likelihood approach 5 Exponential tilting method 6 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

31 Partial Likelihood approach A classical way of likelihood-based approach for parameter estimation under non ignorable nonresponse is to maximize L obs (θ, φ) with respect to (θ, φ), where L obs (θ, φ) = δ i =1 f (y i x i ; θ) g (δ i x i, y i ; φ) δ i =0 f (y i x i ; θ) g (δ i x i, y i ; φ) dy i Such approach can be called full likelihood-based approach because it uses full information available in the observed data. However, it is well known that such full likelihood-based approach is quite sensitive to the failure of the assumed model. On the other hand, partial likelihood-based approach (or conditional likelihood approach) uses a subset of the sample. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

32 Conditional Likelihood approach Idea Since f (y x)g(δ x, y) = f 1(y x, δ)g 1(δ x), for some f 1 and g 1, we can write L obs (θ) = f 1 (y i x i, δ i = 1) g 1 (δ i x i ) δ i =1 δ i =0 f 1 (y i x i, δ i = 0) g 1 (δ i x i ) dy i = δ i =1 f 1 (y i x i, δ i = 1) n g 1 (δ i x i ). The conditional likelihood is defined to be the first component: L c(θ) = δ i =1 f 1 (y i x i, δ i = 1) = δ i =1 i=1 f (y i x i ; θ)π(x i, y i ) f (y xi ; θ)π(x i, y)dy, where π(x, y i ) = Pr(δ i = 1 x i, y i ). Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

33 Conditional Likelihood approach Example Assume that the original sample is a random sample from an exponential distribution with mean µ = 1/θ. That is, the probability density function of y is f (y; θ) = θ exp( θy)i (y > 0). Suppose that we observe y i only when y i > K for a known K > 0. Thus, the response indicator function is defined by δ i = 1 if y i > K and δ i = 0 otherwise. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

34 Conditional Likelihood approach Example To compute the maximum likelihood estimator from the observed likelihood, note that S obs (θ) = ( ) 1 θ y i + { } 1 θ E(y i δ i = 0). δ i =1 δ i =0 Since E(Y y > K) = 1 θ K exp( θk) 1 exp( θk), the maximum likelihood estimator of θ can be obtained by the following iteration equation: {ˆθ(t+1) } { 1 = ȳr n r K exp( K ˆθ } (t) ) r 1 exp( K ˆθ, (7) (t) ) where r = n i=1 δ i and ȳ r = r 1 n i=1 δ iy i. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

35 Conditional Likelihood approach Example Since π i = Pr(δ i = 1 y i ) = I (y i > K) and E(π i ) = E{I (y i > K)} = exp( Kθ), the conditional likelihood reduces to θ exp{ θ(y i K)}. δ i =1 The maximum conditional likelihood estimator of θ is ˆθ c = 1 ȳ r K. Since E(y y > K) = µ + K, the maximum conditional likelihood estimator of µ, which is ˆµ c = 1/ˆθ c, is unbiased for µ. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

36 Conditional Likelihood approach Remark Under some regularity conditions, the solution ˆθ c that maximizes L c(θ) satisfies where I c(θ) = E { } θ Sc(θ) x i; θ = Ic 1/2 L (ˆθ c θ) N(0, I ) S c(θ) = ln L c(θ)/ θ, and S i (θ) = ln f (y i x i ; θ) / θ. n i=1 Works only when π(x, y) is a known function. [ E { S i S i π i x i ; θ } {E (S iπ i x i ; θ)} 2 ], E (π i x i ; θ) Does not require nonresponse instrumental variable assumption. Popular for biased sampling problem. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

37 Pseudo Likelihood approach Idea Consider bivariate (x i, y i ) with density f (y x; θ)h(x) where y i are subject to missingness. We are interested in estimating θ. Suppose that Pr(δ = 1 x, y) depends only on y. (i.e. x is nonresponse instrument) Note that f (x y, δ) = f (x y). Thus, we can consider the following conditional likelihood L c(θ) = δ i =1 f (x i y i, δ i = 1) = δ i =1 f (x i y i ). We can consider maximizing the pseudo likelihood L p(θ) = δ i =1 f (y i x i ; θ)ĥ(x i) f (yi x; θ)ĥ(x)dx, where ĥ(x) is a consistent estimator of the marginal density of x. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

38 Pseudo Likelihood approach Idea We may use the empirical density in ĥ(x). That is, ĥ(x) = 1/n if x = x i. In this case, L c(θ) = f (y i x i ; θ) n δ i =1 k=1 f (y i x k ; θ). We can extend the idea to the case of x = (u, z) where z is a nonresponse instrument. In this case, the conditional likelihood becomes p(z i y i, u i ) = i:δ i =1 i:δ i =1 f (y i u i, z i ; θ)p(z i u i ) f (yi u i, z; θ)p(z u i )dz. (8) Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

39 Pseudo Likelihood approach Let ˆp(z u) be an estimated conditional probability density of z given u. Substituting this estimate into the likelihood in (8), we obtain the following pseudo likelihood: i:δ i =1 f (y i u i, z i ; θ)ˆp(z i u i ) f (yi u i, z; θ)ˆp(z u i )dz. (9) The pseudo maximum likelihood estimator (PMLE) of θ, denoted by ˆθ p, can be obtained by solving S p(θ; ˆα) δ i =1 [S(θ; x i, y i ) E{S(θ; u i, z, y i ) y i, u i ; θ, ˆα}] = 0 for θ, where S(θ; x, y) = S(θ; u, z, y) = log f (y x; θ)/ θ and S(θ; ui, z, y i )f (y i u i, z; θ)p(z u i ; ˆα)dz E{S(θ; u i, z, y i ) y i, u i ; θ, ˆα} =. f (yi u i, z; θ)p(z u i ; ˆα)dz Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

40 Pseudo Likelihood approach The Fisher-scoring method for obtaining the PMLE is given by { ˆθ p (t+1) (t) = ˆθ (ˆθ(t) 1 p + I p, ˆα)} Sp(ˆθ (t), ˆα) where I p (θ, ˆα) = δ i =1 [ E{S(θ; u i, z, y i ) 2 y i, u i ; θ, ˆα} E{S(θ; u i, z, y i ) y i, u i ; θ, ˆα} 2]. Variance estimation is very complicated. Jackknife or bootstrap can be used. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

41 1 Introduction 2 Full likelihood-based ML estimation 3 GMM method 4 Partial Likelihood approach 5 Exponential tilting method 6 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

42 Exponential tilting method Motivation Observed likelihood function can be written n [ 1 δi L obs (φ) = {π i (φ)} δ i {1 π i (φ)} f (y x i )dy], i=1 where f (y x) is the true conditional distribution of y given x. To find the MLE of φ, we solve the mean score equation S(φ) = 0, where S(φ) = n [δ i S i (φ) + (1 δ i )E{S i (φ) x i, δ i = 0}], (10) i=1 where S i (φ) = {δ i π i (φ)}( logitπ i (φ)/ φ) is the score function of φ for the density g(δ x, y; φ) = π δ (1 π) 1 δ with π = π(x, y; φ). Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

43 Motivation The conditional expectation in (10) can be evaluated by using f (y x, δ = 0) = P(δ = 0 x, y) f (y x) E{P(δ = 0 x, y) x} Two problems occur: (11) 1 Requires correct specification of f (y x; θ). Known to be sensitive to the choice of f (y x; θ). 2 Computationally heavy: Often uses Monte Carlo computation. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

44 Exponential tilting method Remedy (for Problem One) Idea Instead of specifying a parametric model for f (y x), consider specifying a parametric model for f (y x, δ = 1), denoted by f 1(y x). In this case, Si (φ)f 1(y x i )O(x i, y; φ)dy E{S i (φ) x i, δ i = 0} = f1(y x i )O(x i, y; φ)dy where O(x, y; φ) = 1 π(φ; x, y). π(φ; x, y) Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

45 Remark Based on the following identity f 0 (y i x i ) = f 1 (y i x i ) where f δ (y i x i ) = f (y i x i, δ i = δ) and O (x i, y i ) E {O (x i, Y i ) x i, δ i = 1}, (12) O (x i, y i ) = Pr (δ i = 0 x i, y i ) Pr (δ i = 1 x i, y i ) (13) is the conditional odds of nonresponse. Kim and Yu (2011) considered a Kernel-based nonparametric regression method of estimating f (y x, δ = 1) to obtain E(Y x, δ = 0). Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

46 If the response probability follows from a logistic regression model π(u i, y i ) Pr (δ i = 1 u i, y i ) = exp (φ0 + φ1u i + φ 2y i ) 1 + exp (φ 0 + φ 1u i + φ 2y i ), (14) the expression (12) can be simplified to f 0 (y i x i ) = f 1 (y i x i ) exp (γy i ) E {exp (γy ) x i, δ i = 1}, (15) where γ = φ 2 and f 1 (y x) is the conditional density of y given x and δ = 1. Model (15) states that the density for the nonrespondents is an exponential tilting of the density for the respondents. The parameter γ is the tilting parameter that determines the amount of departure from the ignorability of the response mechanism. If γ = 0, the the response mechanism is ignorable and f 0(y x) = f 1(y x). Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

47 Exponential tilting method Problem Two How to compute E{S i (φ) x i, δ i = 0} = without relying on Monte Carlo computation? Si (φ)o(x i, y; φ)f 1(y x i )dy O(xi, y; φ)f 1(y x i )dy Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

48 Computation for E 1{Q(x i, Y ) x i } = Q(x i, y)f 1(y x i )dy. If x i were null, then we would approximate the integration by the empirical distribution among δ = 1. Use Q(x i, y)f 1(y x i )dy = Q(x i, y) f1(y x i) f 1(y)dy f 1(y) Q(x i, y j ) f1(y j x i ) f δ j =1 1(y j ) where f 1(y) = f 1(y x)f (x δ = 1)dx δ i =1 f 1(y x i ). Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

49 Exponential tilting method In practice, f 1(y x) is unknown and is estimated by ˆf 1(y x) = f 1(y x; ˆγ). Thus, given ˆγ, a fully efficient estimator of φ can be obtained by solving S 2(φ, ˆγ) n { δi S(φ; x i, y i ) + (1 δ i ) S 0(φ x i ; ˆγ, φ) } = 0, (16) i=1 where and S 0(φ x i ; ˆγ, φ) = δ j =1 S(φ; x i, y j )f 1(y j x i ; ˆγ)O(φ; x i, y j )/ˆf 1(y j ) δ j =1 f1(y j x i ; ˆγ)O(φ; x i, y j )/ˆf 1(y i ) ˆf 1(y) = n 1 R n δ i f 1(y x i ; ˆγ). i=1 May use EM algorithm to solve (16) for φ. Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

50 Exponential tilting method Step 1: Use the responding part of (x i, y i ), obtain ˆγ in the model f 1(y x; γ). S 1(γ) δ i =1 S 1(γ; x i, y i ) = 0. (17) Step 2: Given ˆγ from Step 1, obtain ˆφ by solving (16): S 2(φ, ˆγ) = 0. Step 3: Using ˆφ computed from Step 2, the PSA estimator of θ can be obtained by solving n δ i U(θ; x i, y i ) = 0, (18) ˆπ i where ˆπ i = π i ( ˆφ). i=1 Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

51 Exponential tilting method Remark In many cases, x is categorical and f 1(y x) can be fully nonparametric. If x has a continuous part, nonparametric Kernel smoothing can be used. The proposed method seems to be robust against the failure of the assumed model on f 1(y x; γ). Asymptotic normality of PSA estimator can be obtained & Linearization method can be used for variance estimation (Details skipped) By augmenting the estimating function, we can also impose a calibration constraint such as n δ i n x i = x i. ˆπ i i=1 i=1 Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

52 Exponential tilting method Example 4 (Example 6.5 of KS) Assume that both x i = (z i, u i ) and y i are categorical with category {(i, j); i S z S u} and S y, respectively. We are interested in estimating θ k = Pr(Y = k), for k S y. Now, we have nonresponse in y and let δ i be the response indicator function for y i. We assume that the response probability satisfies Pr (δ = 1 x, y) = π (u, y; φ). To estimate φ, we first compute the observed conditional probability of y among the respondents: δ j =1 ˆp 1(y x i ) = I (x j = x i, y j = y) δ j =1 I (x. j = x i ) Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

53 Exponential tilting method Example 4 (Cont d) The EM algorithm can be implemented by (16) with δ j =1 S 0(φ x i ; φ) = S(φ; δ i, u i, y j )ˆp 1(y j x i )O(φ; u i, y j )/ˆp 1(y j ) δ j =1 ˆp1(y, j x i )O(φ; u i, y j )/ˆp 1(y j ) where O(φ; u, y) = {1 π(u, y; φ)}/π(u, y; φ) and ˆp 1(y) = n 1 R n δ i ˆp 1(y x i ). Alternatively, we can use y S y S(φ; δ i, u i, y)ˆp 1(y x i )O(φ; u i, y) S 0(φ x i ; φ) =. (19) y S y ˆp 1(y x i )O(φ; u i, y) i=1 Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

54 Exponential tilting method Example 4 (Cont d) Once ˆπ(u, y) = π(u, y; ˆφ) is computed, we can use ˆθ k,et = n 1 I (y i = k) + δ i =1 δ i =0 where w iy is the fractional weights computed by y S y w iy I (y = k), wiy {ˆπ(u i, y)} 1 1}ˆp 1(y x i ) = y S y {ˆπ(u i, y) 1 1}ˆp 1(y x i ). Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

55 Real Data Example Exit Poll: The Assembly election (2012 Gang-dong district in Seoul) Gender Age Party A Party B Other Refusal Total Male , Female Total 1,809 1, ,473 Truth 62,489 57,909 1, ,022 Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

56 Comparison of the methods (%) Table : Analysis result : Gang-dong district in Seoul Method Party A Party B Other No adjustment Adjustment (Age * Sex) New Method Truth Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

57 Analysis result in Seoul (48 Seats) Table : Analysis result : 48 seats in Seoul Method Party A Party B Other No adjustment Adjustment (Age* Sex) New Method Truth Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

58 1 Introduction 2 Full likelihood-based ML estimation 3 GMM method 4 Partial Likelihood approach 5 Exponential tilting method 6 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

59 Concluding remarks Uses a model for the response probability. Parameter estimation for response model can be implemented using the idea of maximum likelihood method. Instrumental variable needed for identifiability of the response model. Likelihood-based approach vs GMM approach Less tools for model diagnostics or model validation Promising areas of research Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

60 REFERENCES Ibrahim, J. G. (1990), Incomplete data in generalized linear models, Journal of the American Statistical Association 85, Kim, J. K. and C. L. Yu (2011), A semi-parametric estimation of mean functionals with non-ignorable missing data, Journal of the American Statistical Association 106, Kott, P. S. and T. Chang (2010), Using calibration weighting to adjust for nonignorable unit nonresponse, Journal of the American Statistical Association 105, Wang, S., J. Shao and J. K. Kim (2014), Identifiability and estimation in problems with nonignorable nonresponse, Statistica Sinica 24, Wei, G. C. and M. A. Tanner (1990), A Monte Carlo implementation of the EM algorithm and the poor man s data augmentation algorithms, Journal of the American Statistical Association 85, Jae-Kwang Kim (ISU) Nonignorable missing July 4th, / 59

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score