Biostat 2065 Analysis of Incomplete Data

Size: px

Start display at page:

Download "Biostat 2065 Analysis of Incomplete Data"

Barbra Neal
5 years ago
Views:

1 Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005

2 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies that ( θ θ) N(0, C), where C = J 1 (θ) is the inverse of the expected information. Therefore a natural estimate of C is C 1 = J 1 ( θ). An alternative estimate is C 2 = I 1 (θ). When misspecification is plausible, a robust estimate of the variance is where C 3 = I 1 ( θ)k( θ)i 1 ( θ), K(θ) = l(θ) l(θ) T. θ θ In Newton-Raphson algorithm, C 2 is computed as part of an iteration step. The components of C 3 can be easily obtained from the score and C 2 in NR algorithm. When the EM algorithm or one of its variants is used for ML estimation, additional steps are needed to compute standard errors of the estimates. For example, further calculation C 2 or Louis s formular may be necessary.

3 2. Supplemented EM algorithm Supplemented EM(SEM) (Meng & Rubin, 1991) is a way to calculate the large-sample covariance matrix of θ using only 1. code for the E and M steps of EM. 2. code for the large-sample complete-data variance-covariance matrix, V c. 3. standard matrix operations. Recall that DM = I i obs i 1 com, where DM is the derivative of the EM mappping, i com is the complete information and i obs = I(θ Y obs ) is the observed information. Therefore, I 1 (θ Y obs ) = i 1 obs = i 1 com(i DM) 1 = V com (I DM) 1. Denote V obs = I 1 (θ Y obs ), then V obs = V com (I DM + DM)(I DM) 1 = V com {I + DM(I DM) 1 } := V com + V. where V = V com DM(I DM) 1 is the increase in variance due to missing data. Though the map M has no explicit form, its derivative DM can be effectively approximated using some extra EM steps.

4 3. Implementation of SEM First obtain the MLE θ and then run a sequence of SEM iterations as following: the initival value is close to θ and at step t, the currence estimate is θ (t), 1. Run the usual E and M steps to obtain θ (t+1). 2. Fis i = 1, calculate θ (t) (i) = ( θ 1,..., θ i 1, θ (t) i, θ i+1,..., θ d ), 3. Treating θ (t) (i) as the current estimate of θ, run one iteration of EM to obtain θ (t+1) (i). 4. Obtain the ratio r (t) ij = θ (t+1) j (i) θ j θ (t) i θ, for j = 1,..., d. i 5. Repeat steps 2 to 4 for i = 2,..., d. The output θ (t+1) and {r (t) ij : i, j = 1,..., d}. With t, the elements of r (t) ij and the limit will be approximation of DM. converge

5 4. Other methods Bootstrapping the observed data Bayesian methods: using the posterior variance under a flat prior

6 5. The ECM algorithm There are situations where the M step does not have an explicit solution even the complete data are from an exponential family. Usually iterative procedures are required for each M step. Sometimes the M-step can be modified into several conditional maximization steps in order to avoid iterative M-steps. This modification is called ECM algorithm. Suppose the parameter θ = {θ 1, θ 2,..., θ S }, and the current estimate is θ (t) = {θ (t) 1,..., θ (t) S }, then the M-step consists of S CM-steps: 1. At the (t + 1/S)th CM-step, let θ = {θ 1, θ (t) 2,..., θ (t) S }, then maximize Q( θ; θ (t) ) with respect to θ 1. Denote the maximizer as θ (t+1/s) Similarly, at the (t + 2/S)th CM-step, let θ = {θ (t+1/s) 1, θ 2, θ (t) 3,..., θ (t) S }, maximize Q( θ; θ (t) ) with respect to θ 2. Denote the maximizer as θ (t+2/s) Repeat sequentially by maxmizing the Q function with respect to θ s and all other parameters are fixed at the previous values, s = 1, 2,..., S. After all, θ (t+1) = {θ (t+1/s) 1,..., θ (t+s/s) S is the updated estimate of θ for the subsequent E-step. Since each CM step increases Q, ECM is a GEM algorithm and monotonically increases the likelihood of θ. Under the same conditions that guarantee the convergence of EM, ECM converges to a stationary point of the likelihood, i.e., a solution to the score equation of θ.

7 Example 8.6. A multivariate normal regression model with incomplete data. Model: y i N K (X i β, Σ), i = 1, 2,..., n.

8 6. Univariate t with unknown degrees of freedom Suppose that the observed data consist of a random sample X = (x 1, x 2,..., x n ) from a Student s t distribution with center µ, scale parameter σ, and unknown degrees of freedom ν, with density f(x i ; θ) = Γ(ν/2 + 1/2) (πνσ 2 ) 1/2 Γ(ν/2){1 + (x i µ) 2 /(νσ 2 )} (ν+1)/2. An augmented complete dataset can be defined as Y = (Y obs, Y mis ), where Y obs = X and Y mis = W = (w 1, w 2,..., w n ) is a vector of unobserved positive quantities, such that pairs (w i, x i ) are independent across units i, with distribution (x i w i ; θ) N(µ, σ 2 /w i ), (w i ; θ) χ 2 ν/ν. The M step is complicated by the estimation of ν. It can be replaced by two CM-steps: 1. CM1: For current parameters θ (t) = (µ (t), σ (t), ν (t) ), maximize Q with respect to (µ, σ) and fixed ν = ν (t). It yields (µ, σ) = (µ (t+1), σ (t+1) ). 2. CM2: Maximize Q with respect to ν with (µ, σ) = (µ (t+1), σ (t+1) ). The completedata loglikelihood is n l(µ, σ 2, ν; Y ) = n/2 log σ 2 1/2 w i (x i µ) 2 /σ 2 + nν/2 log(ν/2) n log Γ(ν/2) + (ν/2 1) i=1 n log w i ν/2 i=1 n w i. i=1

9 Sufficient statistics are... Since ν is a scalar, the maximizer ν (t+1) can be found by an iterative one-dimensional search.

10 7. ECME algorithm The ECME(Expectation/Conditional Maximization Either) algorithm replaces some of the CM steps of ECM, which maximize the contrained expected complete-data loglikelihood function (the Q-function), with steps that maximize the correspondingly constrained actual likelihood function. 1. ECME shares the stable monotone convergence and simplicity of implementation. 2. ECME can have a substantially faster convergence rate than EM or ECM: (a) In some of ECME s M-steps, the actual likelihood (rather than an approximation of it) is being conditionally maximized. (b) ECME allows faster computation with constrained maximization. Example 8.9. Univariate t with unknown degrees of freedom. An ECME algorithm is obtained by retaining the E and CM1 steps of the previous example, but replacing the CM2 step by maximizing the observed loglikelihood with repect to ν.

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data