Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005
1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies that ( θ θ) N(0, C), where C = J 1 (θ) is the inverse of the expected information. Therefore a natural estimate of C is C 1 = J 1 ( θ). An alternative estimate is C 2 = I 1 (θ). When misspecification is plausible, a robust estimate of the variance is where C 3 = I 1 ( θ)k( θ)i 1 ( θ), K(θ) = l(θ) l(θ) T. θ θ In Newton-Raphson algorithm, C 2 is computed as part of an iteration step. The components of C 3 can be easily obtained from the score and C 2 in NR algorithm. When the EM algorithm or one of its variants is used for ML estimation, additional steps are needed to compute standard errors of the estimates. For example, further calculation C 2 or Louis s formular may be necessary.
2. Supplemented EM algorithm Supplemented EM(SEM) (Meng & Rubin, 1991) is a way to calculate the large-sample covariance matrix of θ using only 1. code for the E and M steps of EM. 2. code for the large-sample complete-data variance-covariance matrix, V c. 3. standard matrix operations. Recall that DM = I i obs i 1 com, where DM is the derivative of the EM mappping, i com is the complete information and i obs = I(θ Y obs ) is the observed information. Therefore, I 1 (θ Y obs ) = i 1 obs = i 1 com(i DM) 1 = V com (I DM) 1. Denote V obs = I 1 (θ Y obs ), then V obs = V com (I DM + DM)(I DM) 1 = V com {I + DM(I DM) 1 } := V com + V. where V = V com DM(I DM) 1 is the increase in variance due to missing data. Though the map M has no explicit form, its derivative DM can be effectively approximated using some extra EM steps.
3. Implementation of SEM First obtain the MLE θ and then run a sequence of SEM iterations as following: the initival value is close to θ and at step t, the currence estimate is θ (t), 1. Run the usual E and M steps to obtain θ (t+1). 2. Fis i = 1, calculate θ (t) (i) = ( θ 1,..., θ i 1, θ (t) i, θ i+1,..., θ d ), 3. Treating θ (t) (i) as the current estimate of θ, run one iteration of EM to obtain θ (t+1) (i). 4. Obtain the ratio r (t) ij = θ (t+1) j (i) θ j θ (t) i θ, for j = 1,..., d. i 5. Repeat steps 2 to 4 for i = 2,..., d. The output θ (t+1) and {r (t) ij : i, j = 1,..., d}. With t, the elements of r (t) ij and the limit will be approximation of DM. converge
4. Other methods Bootstrapping the observed data Bayesian methods: using the posterior variance under a flat prior
5. The ECM algorithm There are situations where the M step does not have an explicit solution even the complete data are from an exponential family. Usually iterative procedures are required for each M step. Sometimes the M-step can be modified into several conditional maximization steps in order to avoid iterative M-steps. This modification is called ECM algorithm. Suppose the parameter θ = {θ 1, θ 2,..., θ S }, and the current estimate is θ (t) = {θ (t) 1,..., θ (t) S }, then the M-step consists of S CM-steps: 1. At the (t + 1/S)th CM-step, let θ = {θ 1, θ (t) 2,..., θ (t) S }, then maximize Q( θ; θ (t) ) with respect to θ 1. Denote the maximizer as θ (t+1/s) 1. 2. Similarly, at the (t + 2/S)th CM-step, let θ = {θ (t+1/s) 1, θ 2, θ (t) 3,..., θ (t) S }, maximize Q( θ; θ (t) ) with respect to θ 2. Denote the maximizer as θ (t+2/s) 2. 3. Repeat sequentially by maxmizing the Q function with respect to θ s and all other parameters are fixed at the previous values, s = 1, 2,..., S. After all, θ (t+1) = {θ (t+1/s) 1,..., θ (t+s/s) S is the updated estimate of θ for the subsequent E-step. Since each CM step increases Q, ECM is a GEM algorithm and monotonically increases the likelihood of θ. Under the same conditions that guarantee the convergence of EM, ECM converges to a stationary point of the likelihood, i.e., a solution to the score equation of θ.
Example 8.6. A multivariate normal regression model with incomplete data. Model: y i N K (X i β, Σ), i = 1, 2,..., n.
6. Univariate t with unknown degrees of freedom Suppose that the observed data consist of a random sample X = (x 1, x 2,..., x n ) from a Student s t distribution with center µ, scale parameter σ, and unknown degrees of freedom ν, with density f(x i ; θ) = Γ(ν/2 + 1/2) (πνσ 2 ) 1/2 Γ(ν/2){1 + (x i µ) 2 /(νσ 2 )} (ν+1)/2. An augmented complete dataset can be defined as Y = (Y obs, Y mis ), where Y obs = X and Y mis = W = (w 1, w 2,..., w n ) is a vector of unobserved positive quantities, such that pairs (w i, x i ) are independent across units i, with distribution (x i w i ; θ) N(µ, σ 2 /w i ), (w i ; θ) χ 2 ν/ν. The M step is complicated by the estimation of ν. It can be replaced by two CM-steps: 1. CM1: For current parameters θ (t) = (µ (t), σ (t), ν (t) ), maximize Q with respect to (µ, σ) and fixed ν = ν (t). It yields (µ, σ) = (µ (t+1), σ (t+1) ). 2. CM2: Maximize Q with respect to ν with (µ, σ) = (µ (t+1), σ (t+1) ). The completedata loglikelihood is n l(µ, σ 2, ν; Y ) = n/2 log σ 2 1/2 w i (x i µ) 2 /σ 2 + nν/2 log(ν/2) n log Γ(ν/2) + (ν/2 1) i=1 n log w i ν/2 i=1 n w i. i=1
Sufficient statistics are... Since ν is a scalar, the maximizer ν (t+1) can be found by an iterative one-dimensional search.
7. ECME algorithm The ECME(Expectation/Conditional Maximization Either) algorithm replaces some of the CM steps of ECM, which maximize the contrained expected complete-data loglikelihood function (the Q-function), with steps that maximize the correspondingly constrained actual likelihood function. 1. ECME shares the stable monotone convergence and simplicity of implementation. 2. ECME can have a substantially faster convergence rate than EM or ECM: (a) In some of ECME s M-steps, the actual likelihood (rather than an approximation of it) is being conditionally maximized. (b) ECME allows faster computation with constrained maximization. Example 8.9. Univariate t with unknown degrees of freedom. An ECME algorithm is obtained by retaining the E and CM1 steps of the previous example, but replacing the CM2 step by maximizing the observed loglikelihood with repect to ν.