Likelihood-based inference with missing data under missing-at-random

Size: px

Start display at page:

Download "Likelihood-based inference with missing data under missing-at-random"

Harvey Pearson
5 years ago
Views:

1 Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014

2 Outline 1. Introduction. Parametric fractional imputation (PFI) 3. PFI approximation of the observed likelihood 4. Main theory 5. Simulation study 6. Discussion

3 Introduction Let y = (y 1,, y n ) have a joint density f (y; θ). Instead of observing y, we observe y obs, where y = (y obs, y mis ). We are interested in making inference about θ in the presence of missing data. Maximum likelihood estimator ˆθ maximizes the observed log-likelihood l obs (θ) = log f (y; θ)dy mis.

4 Introduction Inference with missing data is usually based on Wald type inference: ˆθ N(θ, Î 1 obs ), where ˆl obs = l obs (θ)/ θ evaluated at θ = ˆθ. We are interested in inference based on Wilks theorem: { } l obs (θ) l obs (ˆθ) χ p, where l obs (θ) is the observed log likelihood and p is the dimension of θ.

5 Basic setup The observed likelihood involves integration: l obs (θ) = log f (y; θ)dy mis = log f obs (y obs ; θ). Monte Carlo approximation is valid in the neighborhood of θ = ˆθ: { } f (y; θ) f obs (y obs ; θ) = E f (y mis y obs ; θ) y obs where y (j) i = 1 M f (y mis y obs ; ˆθ), M j=1 f (y obs, y (j) mis ; θ) f (y (j) mis y obs ; θ)

6 Basic setup Wilk Inference is made using the likelihood ratio test. If l obs (θ) is known, Wilk C.I. can be constructed as { } θ Θ : {l obs (θ) l obs (ˆθ)} χ p(1 α). Computing the observed log likelihood is challenging because the integration over the random variable y i,mis is often intractable. Monte Carlo methods for approximating the observed data likelihood using samples from a distribution near the MLE will fail to compute l obs (θ) correctly when θ is far from the MLE.

7 Parametric fractional imputation (PFI) Main idea of EM algorithm of PFI (Kim, 011) PFI saves the computation associated with MCEM. It uses the importance sampling idea to compute the mean score function where the imputed values are generated only once by importance sampling at the beginning of the EM iteration. PFI doesn t change the imputed values but changes the fractional weights for each EM iteration. It largely reduces the computation burden and ensures the convergence of the EM sequence.

8 Parametric fractional imputation (PFI) (I-step) For missing y i,mis, y (j) i,mis h(y i,mis), for j = 1,..., m. (W-step) Given the current parameter value θ (t), compute S (t) (θ) n m i=1 j=1 w ij (θ (t) )S(θ; y ij ) = 0, (1) where S(θ; y) is the score function of θ, w ij (θ) = and y ij = (y i,obs, y (j) i,mis ). m k=1 f (yij (j) ; θ)/h(yi,mis ) {f (y (k) ik ; θ)/h(yi,mis )}, (M-step) Update parameter ˆθ (t+1) by solving (1). Repeat (W-step) and (M-step) until convergence.

9 Remarks on PFI method (I-step) + (W-step) = E-step Key part is to compute the fractional weights w ij (θ) = m k=1 f (yij (j) ; θ)/h(yi,mis ) {f (y (k) ik ; θ)/h(yi,mis )}. Here, h is often called proposal distribution and f is called target distribution.

10 Parametric fractional imputation (PFI) Wald Inference can be constructed based on ( ) ˆθ N θ, Î 1 obs, where I obs is the approximate information matrix. Issue with Wald inference Wald-type confidence intervals often have poor coverage when the sampling distribution of the MLE is skewed. In such cases, Wilk-type inference is preferred.

11 PFI approximation of likelihood Using the PFI data, we can express m f obs,i (y i,obs ; θ) j=1 {f (y (j) ij ; θ)/h(y = (j) {1/h(y i=1 = m j=1 i,mis )} i,mis )} 1 m j=1 {w ij (θ)/f (y ij ; θ)}. The observed log-likelihood function l obs (θ) can be approximated by n m lobs (θ) = { log w ij (θ)/f (yij ; θ) }. Given the imputed values y ij j=1, only need f (y ij ; θ) and h(y ij ).

12 Main Theory 1 Theorem The imputed observed likelihood ratio statistic for testing H 0 : θ = θ 0 is, { } W 1 = lobs (θ 0) lobs (ˆθ). Under the regularity conditions specified, under the null hypothesis H 0, W 1 χ (p), as m and n.

13 Main Theory Theorem Let θ = (θ 1, θ ), where θ 1 and θ are q 1 and (p q) 1 vectors, respectively. Under the same regularity conditions of Theorem 1, under H 0 : θ 1 = θ 1,0, { } W = lobs (ˆθ (0) ) lobs (ˆθ) χ (q), as m and n, where ˆθ (0) = arg max H0 l obs (θ).

14 Remarks on Theorem There are two models nvolved in Theorem. One is the full model f and the other is the reduced model f 0, the model under null hypothesis. Recall that n m lobs (θ) = log w ij (θ) = i=1 m k=1 j=1 { w ij (θ)/f (y ij ; θ) }. f (yij (j) ; θ)/h(yi,mis ) {f (y (k) ik ; θ)/h(yi,mis )}. Thus, h remains the same but f needs to be changed to f 0 under the reduced model.

15 An example of likelihood ratio test Consider the following bivariate normal distribution ( ) [( ) ( y µ1 σ N, 1 ρσ 1 σ y µ ρσ 1 σ σ )] for i = 1,..., n. We are interested in testing H 0 : µ 1 = µ. Table: Data structure y 1 y Imputation Proposal h( ) H K X {(y (j) L X {(y (j) M X X {(y (j), y (j) h K (y y 1 ) h L (y 1 y ), wij )}m j=1 h M (y 1, y )

16 An example of likelihood ratio test Consider the following bivariate normal distribution ( ) [( ) ( y µ1 σ N, 1 ρσ 1 σ y µ ρσ 1 σ σ )] for i = 1,..., n. We are interested in testing H 0 : µ 1 = µ. Table: Data structure y 1 y Imputation Proposal h( ) H K X {(y (j) L X {(y (j) M X X {(y (j), y (j) h K (y y 1 ) h L (y 1 y ), wij )}m j=1 h M (y 1, y )

17 An example of likelihood ratio test Consider the following bivariate normal distribution ( ) [( ) ( y µ1 σ N, 1 ρσ 1 σ y µ ρσ 1 σ σ )] for i = 1,..., n. We are interested in testing H 0 : µ 1 = µ. Table: Data structure y 1 y Imputation Proposal h( ) H K X {(y (j) L X {(y (j) M X X {(y (j), y (j) h K (y y 1 ) h L (y 1 y ), wij )}m j=1 h M (y 1, y )

18 An example of likelihood ratio test Consider the following bivariate normal distribution ( ) [( ) ( y µ1 σ N, 1 ρσ 1 σ y µ ρσ 1 σ σ )] for i = 1,..., n. We are interested in testing H 0 : µ 1 = µ. Table: Data structure y 1 y Imputation Proposal h( ) H K X {(y (j) L X {(y (j) M X X {(y (j), y (j) h K (y y 1 ) h L (y 1 y ), wij )}m j=1 h M (y 1, y )

19 An example of likelihood ratio test Consider the following bivariate normal distribution ( ) [( ) ( y µ1 σ N, 1 ρσ 1 σ y µ ρσ 1 σ σ )] for i = 1,..., n. We are interested in testing H 0 : µ 1 = µ. Table: Data structure y 1 y Imputation Proposal h( ) H K X {(y (j) L X {(y (j) M X X {(y (j), y (j) h K (y y 1 ) h L (y 1 y ), wij )}m j=1 h M (y 1, y )

20 An example of likelihood ratio test Consider the following bivariate normal distribution ( ) [( ) ( y µ1 σ N, 1 ρσ 1 σ y µ ρσ 1 σ σ )] for i = 1,..., n. We are interested in testing H 0 : µ 1 = µ. Table: Data structure y 1 y Imputation Proposal h( ) H K X {(y (j) L X {(y (j) M X X {(y (j), y (j) h K (y y 1 ) h L (y 1 y ), wij )}m j=1 h M (y 1, y )

21 An example of likelihood ratio test Consider the following bivariate normal distribution ( ) [( ) ( y µ1 σ N, 1 ρσ 1 σ y µ ρσ 1 σ σ )] for i = 1,..., n. We are interested in testing H 0 : µ 1 = µ. Table: Data structure y 1 y Imputation Proposal h( ) H K X {(y (j) L X {(y (j) M X X {(y (j) h K (y y 1 ) h L (y 1 y ), y (j), wij )}m j=1 h M (y 1, y )

22 An example of likelihood ratio test Consider the following bivariate normal distribution ( ) [( ) ( y µ1 σ N, 1 ρσ 1 σ y µ ρσ 1 σ σ )] for i = 1,..., n. We are interested in testing H 0 : µ 1 = µ. Table: Data structure y 1 y Imputation Proposal h( ) H K X {(y (j) L X {(y (j) M X X {(y (j) h K (y y 1 ) h L (y 1 y ), y (j), wij )}m j=1 h M (y 1, y )

23 An example of likelihood ratio test Consider the following bivariate normal distribution ( ) [( ) ( y µ1 σ N, 1 ρσ 1 σ y µ ρσ 1 σ σ )] for i = 1,..., n. We are interested in testing H 0 : µ 1 = µ. Table: Data structure y 1 y Imputation Proposal h( ) H K X {(y (j) L X {(y (j) M X X {(y (j) h K (y y 1 ) h L (y 1 y ), y (j), wij )}m j=1 h M (y 1, y )

24 An example of likelihood ratio test Consider the following bivariate normal distribution ( ) [( ) ( y µ1 σ N, 1 ρσ 1 σ y µ ρσ 1 σ σ )] for i = 1,..., n. We are interested in testing H 0 : µ 1 = µ. Table: Data structure y 1 y Imputation Proposal h( ) H K X {(y (j) L X {(y (j) M X X {(y (j) h K (y y 1 ) h L (y 1 y ), y (j), wij )}m j=1 h M (y 1, y )

25 An example of likelihood ratio test Consider the following bivariate normal distribution ( ) [( ) ( y µ1 σ N, 1 ρσ 1 σ y µ ρσ 1 σ σ )] for i = 1,..., n. We are interested in testing H 0 : µ 1 = µ. Table: Data structure y 1 y Imputation Proposal h( ) H K X {(y (j) L X {(y (j) M X X {(y (j) h K (y y 1 ) h L (y 1 y ), y (j), wij )}m j=1 h M (y 1, y )

26 An example of likelihood ratio test Under the full model, for i K, the fractional weights are given by w ij (θ) = m k=1 f (y (j) i ; θ)/h K (y (j) y ) {f (y (k) i ; θ)/h K (y (k) y )}. Similar for L and M. The MLE under the full model is computed by solving n wij (θ)s(θ; y (j) i ) = 0. i=1 We may use the EM algorithm to obtain the solution. The maximum of the observed likelihood under the full model n m lobs (ˆθ) = log wij (ˆθ)/f (y (j) i ; ˆθ). i=1 j=1

27 An example of likelihood ratio test Let f 0 be the density for the reduced model under H 0 : µ 1 = µ = µ. For i K, the fractional weights are given by w0,ij(θ) f 0 (y (j) i ; θ)/h K (y (j) y ) = m k=1 {f 0(y (k) i ; θ)/h K (y (k) y )}. Similar for L and M. Note that we are using the same imputed values for this computation. The MLE of θ under the reduced model, denoted by ˆθ 0, is obtained by solving n w0,ij(θ)s 0 (θ; y (j) i ) = 0, i=1 where S 0 (θ; y i ) is the score function derived from f 0.

28 An example of likelihood ratio test The maximum of the observed likelihood under the null model is given by n m l0,obs (ˆθ 0 ) = log w 0,ij(ˆθ 0 )/f 0 (y (j) i ; ˆθ 0 ). i=1 The test statistic for testing H 0 : µ 1 = µ is computed from the PFI data as { } W = l0,obs (ˆθ (0) ) lobs (ˆθ). j=1 If W > χ 1,1 α, then we reject the null model.

29 Simulation One Profile likelihood confidence interval y i = + x i + e i, x i N(1, 1), e i N(0, 1), where x i is fully observed, and y i is subjected to missing. δ i iid Bernoulli(0.6). Variable y i is observed if δ i = 1 and y i is missing if δ i = 0. Monte Carlo samples were independently generated B =, 000 times. Constructing 95% confidence interval for β 1 and σ. Two methods: Wald method using asymptotic normality Wilk method using the result of Theorem.

30 Simulation One Table: Monte Carlo length and coverage of the Wald and Wilk confidence intervals for β 1. Wald C.I. Wilk C.I. sample size length coverage length coverage n = n = n =

31 Simulation One Table: Monte Carlo length and coverage of the Wald and Wilk confidence intervals for σ. Wald C.I. Wilk C.I. sample size length coverage length coverage n = n = n = When n=0, about 8% of Monte Carlo samples have negative values for σ with the Wald confidence intervals.

32 Simulation One Sampling distribution of ˆβ n=0 Density n=50 n=100 Density Density

33 Simulation One Sampling distribution of ˆσ n=0 Density n=50 n=100 Density Density

34 Simulation Two Likelihood ratio test Samples of size n = 100 and n = 00 are generated from y i = β 0 + β 1 x + β(( x + ) e i,( where x i = (x, x ) N, 0.1 δ i Bernoulli(0.6) )), e i N(0, 1). (β 0, β 1 ) = (, 1), β changes from 0, 0.1, 0., and 0.3. We are interested in testing the null hypothesis H 0 : β = 0 using the likelihood ratio test (LRT) of Fractional Imputation (FI) the LRT of Multiple Imputation (MI) 1 1 Meng, X.L. and Rubin, D.B. (199). Performing Likelihood Ratio Tests with Multiply-Imputed Data Sets, Biometrika, 79,

35 Simulation Two α = 0.05 α = 0.1 Parameter Value LRT.MI LRT.FI LRT.MI LRT.FI β = β = β = β = Table: Monte Carlo power of the likelihood ratio test (LRT) of Multiple Imputation (MI) and Fractional Imputation (FI) for continuous data with sample size n = 100.

36 Simulation Two α = 0.05 α = 0.1 Parameter Value LRT.MI LRT.FI LRT.MI LRT.FI β = β = β = β = Table: Monte Carlo power of the likelihood ratio test (LRT) of Multiple Imputation (MI) and Fractional Imputation (FI) for continuous data with sample size n = 00.

37 Concluding remarks Parametric fractional imputation provides a completed data with fractional weights, which enables to compute the observed log likelihood function: h(y (j) i,mis ) and f (y ij ; θ). LRT from PFI is more powerful than the Wald test based on the central limit theorem and also than the LRT from MI proposed by Meng and Rubin (199). Extension will be a topic of future research: Model selection criteria can be developed, such as AIC or BIC as considered by Ibrahim et al. (008) and Garcia et al. (010).

38 The end

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical