Parametric fractional imputation for missing data analysis

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in Great Britain Parametric fractional imputation for missing data analysis BY JAE KWANG KIM Department of Statistics, Iowa State University, Ames, Iowa 50011, U.S.A. jkim@iastate.edu SUMMARY Imputation is a popular method of handing missing data in practice. Imputation, when carefully done, can be used to facilitate the parameter estimation by applying the complete-sample estimators to the imputed dataset. The basic idea is to generate the imputed values from the conditional distribution of the missing data given the observed data. In this article, parametric fractional imputation is proposed for generating imputed values. Using fractional weights, the observed likelihood can be approximated by the weighted mean of the imputed data likelihood. Some computational efficiency can be achieved using the idea of importance sampling and calibration weighting. The proposed imputation method provides efficient parameter estimates for the model parameters specified in the imputation model and also provides reasonable estimates for parameters that are not part of the imputation model. Variance estimation is covered and results from a limited simulation study are presented. Some key words: EM algorithm, General purpose estimation, Importance sampling, Monte Carlo EM, Multiple imputation. 1. INTRODUCTION Suppose that y 1, y 2,..., y n are the p-dimensional observations for a probability sample selected from a finite population, where the finite population is a realization from a distribution F 0 (y). Suppose that, under complete response, a parameter of interest, denoted by η g = g (y) df0 (y), is estimated by ˆη g = w i g (y i ) (1) for some function g (y i ) with the sampling weight w i. Under simple random sampling, the sampling weight is 1/n and the sample can be regarded as a random sample from an infinite population with distribution F 0 (y). In (1), different choices of g-function can lead to different parameter estimation. For example, g (y i ) = y i can be used to estimate the population mean while g (y i ) = I (y i < c) corresponds to the population proportion of y less than c, where I (y < c) takes the value one if y < c and takes the value zero otherwise. Existence of an unbiased estimator (1) for any g-function is very attractive for general purpose estimation where different users want to estimate different parameters from the same sample, such as the public access data. That is, in general purpose estimation, the parameter η g is unknown at the time of data collection.

49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 2 J.K. KIM Given missing data, we cannot directly compute the estimator (1). Instead, one can use ˆη gr w i E {g (y i ) y i,obs, (2) where (y i,obs, y i,mis ) denote the observed part and missing part of y i, respectively. To simplify the presentation, we assume the sampling mechanism and the response mechanism are ignorable in the sense of Rubin (1976). To compute the conditional expectation in (2), we need a correct specification of the conditional distribution of y i,mis given y i,obs. We adopt a parametric approach in the sense that we assume a fully parametric model for y i so that the conditional distribution of y i,mis given y i,obs can be derived from the joint distribution for an arbitrary missing pattern. Let F 0 {F θ ; θ Ω be the parametric model indexed by θ. Then the conditional expectation in (2) depends on θ 0, where θ 0 is the true parameter value corresponding to F 0. That is, E {g (y i ) y i,obs = E {g (y i ) y i,obs, θ 0. To compute the conditional expectation in (2), a Monte Carlo approximation based on the imputed data can be used. Thus, one can interpret imputation as a Monte Carlo approximation of the conditional expectation given the observed data. Imputation is very attractive in practice because, once the imputed data are created, the data analyst does not need to know the conditional distribution in (2). Monte Carlo methods of approximating the conditional expectation in (2) can be placed in two classes: 1. Bayesian approach: Generate the imputed values from the posterior predictive distribution of y i,mis given y i,obs : f (y i,mis y i,obs ) = f (y i,mis θ, y i,obs ) f (θ y i,obs ) dθ. (3) This is essentially the approach used in multiple imputation as proposed by Rubin (1987). 2. Frequentist ( approach: Generate the imputed values from the conditional distribution f y i,mis y i,obs, ˆθ ) with an estimated value ˆθ. In the Bayesian approach to imputation, the convergence to a stable posterior predictive distribution (3) is difficult to check (Gelman et al, 1996). Also, the variance estimator used in multiple imputation is not consistent for some estimated parameters. For examples, see Wang & Robins (1998) and Kim et al (2006). The frequentist approach for imputation has received less attention than the Bayesian imputation. One notable exception is Wang & Robins (1998) who studied the asymptotic properties of multiple imputation and a parametric frequentist imputation procedure. Wang & Robins (1998) considered the estimated parameter ˆθ to be given, and did not discuss parameter estimation. We consider frequentist imputation given a parametric model for the original distribution. Using the idea of importance sampling, we propose a frequentist imputation method that can be implemented as the fractional imputation discussed by Kim & Fuller (2004). The proposed fractional imputation method can be further modified to remove the Monte Carlo error without increasing the size of Monte Carlo samples. The proposed method uses the calibration technique to obtain the exact maximum likelihood estimator with a modest imputation size. In 2, the parametric fractional imputation method is described. Asymptotic properties of the fractionally imputed estimator are discussed in 3. Maximum likelihood estimation using fractional imputation is discussed in 4. Results from a limited simulation study are presented in 5. Concluding remarks are made in 6.

97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 Parametric fractional imputation 3 2. FRACTIONAL IMPUTATION As discussed in 1, we consider an approximation for the conditional expectation in (2) using fractional imputation. In fractional imputation, M(> 1) imputed values of y i,mis, say y (1) i,mis,..., y (M) i,mis, are generated and assigned fractional weights, w i1,..., w im, so that j=1 wijg ( yij ) { = E g (y i ) y i,obs, ˆθ, (4) where yij = (y i,obs, y (j) i,mis ), holds at least asymptotically for each i, where ˆθ is a consistent estimator of θ 0. A popular choice for ˆθ is the (pseudo) maximum likelihood estimator, which can be obtained by finding θ that maximizes the (pseudo) log-likelihood function. That is, ˆθ = arg max w i ln { f obs(i) (y i,obs ; θ), (5) θ Ω where f obs(i) (y i,obs ; θ) = f (y i ; θ) dy i,mis is the marginal density of y i,obs. In the marginal density of y i,obs, the subscript i is used because the missing pattern can be different for each i. A computationally simple method of finding the maximum likelihood estimator shall be discussed in 4. Condition (4) applied to g(y i ) = c implies that wij = 1 (6) j=1 for all i. From the fractionally imputed data satisfying (4) and (6), the parameter η g can be estimated by ˆη F I,g = w i wijg ( yij). (7) Note that the imputed estimator (7) is obtained by applying the formula (1) using yij as the observations with weights w i wij. The size of the imputed dataset can be as large as nm. For a single parameter η g = E {g (y), any fractional imputation satisfying (4) can provide a consistent estimator of η g. For general purpose estimation, the g-function in η g is unknown at the time of imputation (Fay, 1992). In this case, we cannot expect (4) to hold for any g-function based on a finite number of imputations, with the exception of categorical data. For categorical data, there are a finite number of possible values for y i,mis. In this case, we take the possible values as the imputed values and compute the conditional probability of y i,mis by the following Bayes formula: ( p y (j) i,mis y i,obs, ˆθ ) f(yij = ; ˆθ) Mi k=1 f(y ik ; ˆθ), where f (y i ; θ) is the joint density of y i evaluated at θ and M i is the number of possible values of y i,mis. Note that the choice of wij = p(y (j) i,mis y i,obs, ˆθ) satisfies (4) and (6). Thus, fractional imputation for categorical data using wij = p(y (j) i,mis y i,obs, ˆθ), which is close in spirit to the EM by weighting method of Ibrahim (1990), is discussed in Kim & Rao (2009). For the continuous random variable y i, condition (4) can be approximately satisfied using an importance sampling, where y (1) i,mis,..., y (M) i,mis are independently generated from a distribution

145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 4 J.K. KIM with density h (y i,mis ) which has the same support as f (y i,mis y i,obs, θ) for all θ Ω and the corresponding fractional weights are computed by w ij0 = w ij0(ˆθ) = C i f(y (j) i,mis y i,obs; ˆθ) h(y (j) i,mis ), (8) where C i is chosen to satisfy (6). If h (y i,mis ) = f(y i,mis y i,obs, ˆθ) is used, w ij0 = M 1. REMARK 1. Under mild conditions, ḡi = M j=1 w ij0 g(y ij ) with w ij0 in (8) converges to ḡ i (ˆθ) E{g (y i ) y i,obs, ˆθ with probability 1, as M, and has approximate variance σ 2 i /M, where [ { 2 σi 2 f(y i,mis y i,obs, = E g (y i ) ḡ i (ˆθ) ˆθ) ] y i,obs, h (y i,mis ) ˆθ. The optimal choice of h (y i,mis ) that minimizes σi 2 is ( h (y i,mis ) = f y i,mis y i,obs, ˆθ ) g (y i ) ḡ i (ˆθ) { g E (yi ) ḡ i (ˆθ) y i,obs, ˆθ. (9) When the g-function is unknown, h (y i,mis ) = f(y i,mis y i,obs, ˆθ) is a reasonable choice in terms of statistical efficiency. However, other choices of h (y i,mis ) can have better computational efficiency in some situations. For public assess data, creating the fractional imputation with a large imputation size is not feasible. We propose an approximation with a moderate imputation size, say M = 10. To formally describe the procedure, let y (1) i,mis,..., y (M) i,mis be independently generated from a distribution with density h (y i,mis ). Given the imputed values, it remains to compute the fractional weights that satisfy (4) and (6) as closely as possible. The proposed fractional weights are computed in two steps. In the first step, the initial fractional weights are computed by (8). In the second step, the initial fractional weights are adjusted to satisfy (6) and w i wijs(ˆθ; yij) = 0, (10) where s (θ; y) = ln f (y; θ) / θ is the score function of θ. Adjusting the initial weights to satisfy a constraint is often called calibration. As can be seen in 3, constraint (10) is a calibration constraint making the resulting imputed estimator ˆη F I,g in (7) fully efficient for η g when η g is a linear function of θ. To construct the fractional weights satisfying (6) and (10), the regression weighting technique or the empirical likelihood technique can be used. For example, in the regression weighting technique, the fractional weights are constructed by ( ) T wij = wij0 w i s i w i wij0 ( s ij s ) 2 i 1 w ij0 ( s ij s i ), (11) where w ij0 is the initial fractional weight (8) using importance sampling, s i = M B 2 = BB T, and s ij = s(ˆθ; yij ). Here, M need not be large. j=1 w ij0 s ij,

193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 Parametric fractional imputation 5 If the distribution belongs to the exponential family of the form f (y; θ) = exp { t (y) T θ + φ (θ) + A (y), (12) then (10) can be obtained by n M j=1 w iw ij {t(y ij ) + φ(ˆθ) = 0, where φ (θ) = φ (θ) / θ. In this case, calibration can be used only for complete sufficient statistics. 3. ASYMPTOTIC RESULTS In this section, we discuss some asymptotic properties of the fractional imputed estimator (7). We consider two types of fractionally imputed estimators. One is obtained by using the initial fractional weights in (8) and the other is obtained by using the calibration fractional weights such as (11). Note that the imputed estimator ˆη F I,g in (7) is a function of n and M, where n is the sample size and M is the imputation size for fractional imputation. Thus, we use ˆη g0,n,m and ˆη g1,n,m to denote the fractionally imputed estimator (7) using the initial fractional weights in (8) and the fractionally imputed estimator (7) using the calibration fractional weights in (11), respectively. The following theorem presents some asymptotic properties of the fractionally imputed estimator. The proof is presented in Appendix A. THEOREM 1. Let ˆη g0,n,m and ˆη g1,n,m be fractionally imputed estimators of η g of the form (7) using M fractions where the fractional weights are constructed by (8) and by (11), respectively, and the imputed values are generated from h (y i,mis ). Under some regularity conditions, and (ˆη g0,n,m η g ) /σ g0,n,m L N (0, 1) (13) (ˆη g1,n,m η g ) /σ g1,n,m L N (0, 1) (14) for each M > 1, where [ ] σg0,n,m 2 = var w i {ḡi (θ 0 ) + K1 T s i (θ 0 ), [ ] σg1,n,m 2 = var w i {ḡi (θ 0 ) + K1 T s i (θ 0 ) + B T ( s i (θ 0 ) s i (θ 0 )), ḡ i (θ) = M j=1 w ij0 (θ) g(y ij ), s i (θ) = E θ {s (θ; y i ) y i,obs, s i (θ) = M j=1 w ij0 (θ) s(θ; y ij ), K 1 = [I obs (θ 0 )] 1 I g,mis (θ 0 ), (15) B = [I mis (θ 0 )] 1 I g,mis (θ 0 ), I obs (θ) = E { n w i s i (θ) / θ, I g,mis (θ) = E [ n w i {s (θ; y i ) s i (θ) g (y i )] and I mis (θ) = E [ n w i {s (θ; y i ) s i (θ) 2]. In Theorem 1, note that { σg0,n,m 2 = σg1,n,m 2 + B T var w i ( s i s i ) B and the last term represents the reduction in the variance of the fractional imputation estimator of η g due to the calibration in (10). Thus, σg0,n,m 2 σ2 g1,n,m with equality for M =. Clayton

241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 6 J.K. KIM et al (1998) and Robins & Wang (2000) also proved results similar to (13) for the special case of M =. For variance estimation of the fractional imputation estimator, let ˆV (ˆη g ) = Ω ij g (y i ) g (y j ) be a consistent estimator for the variance of ˆη g = n w ig (y i ) under complete response, where Ω ij are coefficients. Under simple random sampling, Ω ij = 1/ { n 2 (n 1) for i j and Ω ii = 1/n 2. For sufficiently large M, using the results in Theorem 1, a consistent estimator for the variance of ˆη F I,g in (7) is ˆV (ˆη F I,g ) = Ω ij ē i ē j, (16) where ē i = ḡ i (ˆθ) + ˆK 1 T s i (ˆθ) = M j=1 w ij0ê, ê ij = g(y ij ) + ˆK 1 Ts(ˆθ; yij ) and [ T] 1 { ˆK 1 = w i s i (ˆθ) s i (ˆθ) w i wij s(ˆθ; yij) s i g ( yij). For moderate size M, the expected value of variance estimator (16) can be written { E ˆV (ˆηF I,g ) = E Ω ij ē i ē j + E (ē Ω ij cov I i, ē ) j, where ē i = E I (ē i ) and subscript I denotes the expectation over the imputation mechanism generating y (j) i,mis from h (y i,mis). If the imputed values are generated independently, we have { (ē cov I i, ē ) 0 if i j j = var I (ē i ) if i = j and, using the argument for Remark 1, var I (ē i ) can be estimated by ˆV Ii,e M j=1 (w ij0 )2 (ê ij ē i )2. Thus, an unbiased estimator for σ 2 g0,n,m is ˆσ 2 g0,n,m = Ω ij ē i ē j Ω ii ˆVIi,e + wi 2 ˆV Ii,g, (17) where ˆV Ii,g = M j=1 (w ij0 )2 (gij ḡ i )2. Using the argument similar to (17), it can be shown that ˆσ g1,n,m 2 = Ω ij ē i ē j Ω ii ˆVIi,e + wi 2 ˆV Ii,u, (18) is an unbiased estimator of σg1,n,m 2, where ˆV Ii,u = M j=1 (w ij0 )2 (u ij ū i )2 is an unbiased estimator of V I (ū i ), u ij = g(y ij ) ˆB T s(ˆθ; yij ), ū i = M j=1 w ij0 u ij, and 1 ˆB = w i wij0 ( s ij s ) 2 i w i wij0 ( s ij s ) ( ) i g y ij.

289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 Parametric fractional imputation 7 Variance estimation with fractionally imputed data can be also performed using the replication method described in Appendix B. 4. MAXIMUM LIKELIHOOD ESTIMATION In this section, we propose a computationally simple method of obtaining the pseudo maximum likelihood estimator in (5). The pseudo maximum likelihood estimator reduces to the usual maximum likelihood estimator if the sampling design is simple random sampling with w i = 1/n. Instead of (5), the pseudo maximum likelihood estimator of θ 0 can be obtained by ˆθ = arg max θ Ω w i E {ln f (y i ; θ) y i,obs. (19) For w i = 1/n, Dempster et al (1977) proved that the maximum likelihood estimator in (19) maximizes (5) and proposed the EM algorithm that computes the solution iteratively by defining ˆθ (t+1) to be the solution to ˆθ (t+1) = arg max θ Ω w i E {ln f (y i ; θ) y i,obs, ˆθ (t), (20) where ˆθ (t) is the estimate of θ obtained at the t-th iteration. To compute the conditional expectation in (20), Monte Carlo implementation of the EM algorithm of Wei & Tanner (1990) can be used. The Monte Carlo EM method avoids the analytic computation of the conditional expectation (20) by using the Monte Carlo approximation based on the imputed data. In the Monte Carlo EM method, independent draws of y i,mis are generated from the conditional distribution f(y i,mis y i,obs, ˆθ (t) ) for each t to approximate the conditional expectation in (20). The Monte Carlo EM method requires heavy computation because the imputed values are re-generated for each iteration t. Also, generating imputed values from f(y i,mis y i,obs, ˆθ (t) ) can be computationally challenging since it often requires an iterative algorithm such as Metropolis- Hastings algorithm for each EM iteration. To avoid re-generating values from the conditional distribution at each step, we propose the following algorithm for parametric fractional imputation using importance sampling: [Step 0] Obtain an initial estimator ˆθ (0) of θ and set h(y i,mis ) = f(y i,mis y i,obs, ˆθ (0) ). [Step 1] Generate M imputed values, y (1) i,mis,..., y (M) i,mis, from h (y i,mis). [Step 2] With the current estimate of θ, denoted by ˆθ (t), compute the fractional weights by wij(t) = w ij0(ˆθ (t) ), where w ij0 (ˆθ) is defined in (8). [Step 3] (Optional) If wij(t) > C/M for some i = 1, 2,..., n and j = 1, 2,..., M, then set h (y i,mis ) = f(y i,mis y i,obs, ˆθ (t) ) and go to Step 1. Increase M if necessary. [Step 4] Solve the weighted score equation for θ to get ˆθ (t+1), where ˆθ (t+1) maximizes Q ( θ ˆθ (t) ) = w i wij(t) ln f ( yij; θ ) (21) over θ Ω. [Step 5] Set t = t + 1 and go to Step 2. Stop if ˆθ (t) meets the convergence criterion.

337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 8 J.K. KIM In Step 0, the initial estimator ˆθ (0) can be the maximum likelihood estimator obtained by using only the respondents. Step 1 and Step 2 correspond to the E-step of the EM algorithm. Step 3 can be used to control the variation of the fractional weights and to avoid extremely large fractional weights. The threshold C/M in Step 3 guarantees that no individual fractional weight is bigger than C times of the average of the fractional weights. In Step 4, the value of θ that maximizes Q (θ ˆθ (t) ) in (21) can often be obtained by solving w i wij(t) s ( θ; yij) = 0, (22) where s (θ; y) = ln f (y; θ) / θ is the score function of θ. Thus, the solution can be obtained by applying the complete sample score equation to the fractionally imputed data. The equation (22) can be called the imputed score equation using fractional imputation. Unlike the Monte Carlo EM method, the imputed values are not changed for each iteration, only the fractional weights are changed. REMARK 2. In Step 2, fractional weights can be computed by using the joint density with the current parameter estimate ˆθ (t). Note that wij(0) (θ) in (8) can be written w ij(0) (θ) = f(y (j) i,mis y i,obs, θ)/h(y (j) i,mis ) M k=1 f(y (k) i,mis y i,obs, θ)/h(y (k) i,mis ) = f(yij ; θ)/h(y (j) i,mis ) M k=1 f(y ik ; (23) θ)/h(y (k) i,mis ), which does not require the marginal density in computing the conditional distribution. Only the joint density is needed. Given the M imputed values, y (1) i,mis,..., y (M) i,mis, generated from h (y i,mis), the sequence of estimators {ˆθ (0), ˆθ (1),... can be constructed from the parametric fractional imputation using importance sampling. The following theorem presents some convergence properties of the sequence of the estimators. THEOREM 2. Assume that the M imputed values are generated from h (y i,mis ). Let Q (θ ˆθ (t) ) be the weighted log likelihood function (21) based on the fractional imputation. If then where l obs (θ) = n w i ln{f obs(i) (y i,obs; θ) and Q (ˆθ (t+1) ˆθ (t) ) Q (ˆθ (t) ˆθ (t) ) (24) f obs(i) (y i,obs; θ) = l obs (ˆθ (t+1) ) l obs (ˆθ (t) ), (25) M j=1 f(y ij ; θ)/h(y (j) i,mis ) M j=1 1/h(y (j) i,mis ).

385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 Proof. By (23) and using the Jensen s inequality, Therefore, (24) implies (25). l obs (ˆθ (t+1) ) l obs (ˆθ (t) ) = Parametric fractional imputation 9 w i ln wij(t) f(yij ; ˆθ (t+1) ) f(yij ; ˆθ (t) ) j=1 w i w ij(t) ln f(y ij ; ˆθ (t+1) ) f(y ij ; ˆθ (t) ) = Q (ˆθ (t+1) ˆθ (t) ) Q (ˆθ (t) ˆθ (t) ). Note that lobs (θ) is an imputed version of the observed pseudo log-likelihood based on the the M imputed values, y (1) i,mis,..., y (M) i,mis. Thus, by Theorem 1, the sequence l obs (ˆθ (t) ) is monotonically increasing and, under the conditions stated in Wu (1983), the convergence of ˆθ (t) follows for fixed M. Theorem 2 does not hold for the sequence obtained from the Monte Carlo EM method for fixed M, because the imputed values are re-generated for each E-step on the Monte Carlo EM method, and the convergence is very hard to check for the Monte Carlo EM (Booth & Hobert, 1999). REMARK 3. Sung and Geyer (2007) considered a Monte Carlo maximum likelihood method that directly maximizes l obs (θ). Computing the value of θ that maximizes Q (θ ˆθ (t) ) is easier than computing the value of θ that maximizes l obs (θ). 5. SIMULATION STUDY In a simulation study, B = 2, 000 Monte Carlo samples of size n = 200 were independently generated from an infinite population specified by x i N (2, 1); y i x i N (β 0 + β 1 x i, σ ee ) with (β 0, β 1, σ ee ) = (1, 0 7, 1); z i (x i, y i ) Ber (p i ), where log{p i /(1 p i ) = ψ 0 + ψ 1 y i with (ψ 0, ψ 1 ) = ( 3, 1), and δ i (x i, y i, z i ) Ber (π i ) where log{π i /(1 π i ) = 0 5x i + 0 8z i. Variable y i is observed if δ i = 1 and is not observed if δ i = 0. The overall response rate for y is about 76%. Variable x is the covariate for the regression of y on x and is always observed. Variable z is the surrogate variable for y in the sense that interest is focused on estimation of the parameter β in the regression model P β (y x) but incorporating the surrogate variable z can improve the efficiency of the resulting estimators (Pepe, 1992). Also, note that, ignoring the surrogate variable z, the response mechanism is not ignorable. By incorporating the surrogate variable into the estimation procedure, we can safely ignore the response mechanism. We are interested in estimating three parameters: θ 1 = E (y); the marginal mean of y, θ 2 = β 1 ; the slope for the regression of y on x, and θ 3 = pr (y < 3); the proportion of y less than 3. Under complete response, θ 1 and θ 2 are computed by the maximum likelihood method and the proportion θ 3 is estimated by ˆθ 3,n = 1 n I (y i < 3), (26) which is different from the maximum likelihood estimator 3 ( ) y ˆµy φ dy, (27) ˆσ yy

433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 10 J.K. KIM where φ (y) is the density of the standard normal distribution and (ˆµ y, ˆσ yy ) is the maximum likelihood estimator of (µ y, σ yy ). Under nonresponse, five estimators were computed: 1. The parametric fractional imputation estimator proposed using wij0 in (8) with M = 100 and M = 10. 2. The calibration fractional imputation estimator using the regression weighting method in (11) with M = 10. 3. Multiple imputation with M = 100 and M = 10. In addition to the five imputed estimators, we also compute the complete sample estimator for comparison. In fractional imputation, M imputed values were independently generated from N( ˆβ 0(0) + ˆβ 1(0) x i, ˆσ ee(0) ), where ( ˆβ 0(0), ˆβ 1(0), ˆσ ee(0) ) is the regression parameters computed from the respondents only. For each imputed value, we assign the fractional weight { exp (y (j) wij(t) i ˆβ 0(t) ˆβ ( 1(t) x i ) 2 /ˆσ ee(t) exp ˆψ0(t) z i + ˆψ ) 1(t) z i y (j) i { exp (y (j) i ˆβ 0(0) ˆβ { ( 1(0) x i ) 2 /ˆσ ee(0) 1 + exp ˆψ0(t) + ˆψ ), 1(t) y (j) i where (β 0(t), β 1(t), ˆσ ee(t), ˆψ 0(t), ˆψ 1(t) ) is obtained by the maximum likelihood method using the fractionally imputed data with fractional weight wij(t 1). In Step 3 of the fractional imputation for maximum likelihood estimation in 4, C = 5 was used. In the calibration fractional imputation method, M = 10 values were randomly selected from M 1 = 100 initial fractionally imputed values by systematic sampling with selection probability proportional to wij0 in (8). The regression fractional weights were then computed by (11). In Step 5, the convergence criterion was ˆθ (t+1) ˆθ (t) < ɛ = 10 9. The average number of iterations for convergence were 45 and 35 for M = 100 and M = 10, respectively. In multiple imputation, the imputed values are generated from the posterior predictive distribution iteratively using Gibbs sampling with 100 iterations. The Monte Carlo means and the Monte Carlo standardised variances of the complete sample estimator and the five imputed estimators are presented in Table 1. The standardised variance in Table 1 was computed by dividing the variance of each estimator by that of the full sample estimator. The simulation results in Table 1 show that fractional imputation is generally more efficient than multiple imputation because multiple imputation has additional variability due to generating the parameter values. For estimation of the proportion, it is possible for the multiple imputation estimator with M = 100 to be more efficient than the calibration fractional imputation estimator with M = 10. The differences in efficiencies for these two parameters are less than one percent. In addition to point estimators, variance estimates were also computed from each Monte Carlo sample. For variance estimation of fractional imputation, we used the linearized variance estimator discussed in 3. For variance estimation of the calibration fractional imputation estimator, we used the one-step jackknife variance estimator discussed in Appendix B. For multiple imputation variance estimation, we used the variance formula of Rubin (1987). Table 2 presents the Monte Carlo relative biases and t-statistics for the variance estimators. The t-statistic is the statistic for testing zero bias in the variance estimator. The sizes of simulation error for the relative bias of variance estimators reported in Table 2 are less than 1%. Table 2 shows that, for the fractional imputation estimators, both linearization and replication methods provide consistent estimates for the variance of the parameter estimates, with t-statistics less than 2 in absolute value. The multiple imputation variance estimators are also unbiased for

481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 Parametric fractional imputation 11 Table 1. Monte Carlo means and standardised variances of the imputed estimators Parameter Method Mean Std Var Complete Sample 2.4 100 FI (M=100) 2.4 121 θ 1 FI (M=10) 2.4 124 CFI (M=10) 2.4 121 MI (M=100) 2.4 128 MI (M=10) 2.4 130 Complete Sample 0.7 100 FI (M=100) 0.7 137 θ 2 FI (M=10) 0.7 141 CFI (M=10) 0.7 136 MI (M=100) 0.7 155 MI (M=10) 0.7 159 Complete Sample 0.69 100 FI (M=100) 0.69 102 θ 3 FI (M=10) 0.69 103 CFI (M=10) 0.69 102 MI (M=100) 0.69 101 MI (M=10) 0.69 102 FI, fractional imputation; CFI, calibration fractional imputation; MI, multiple imputation; Std Var, standardised variance.. the variance of estimators for θ 1 and θ 2 which were explicitly considered in the imputation model. For variance estimation of the proportion, the multiple imputation variance estimator shows significant biases (15.53% for M = 10 and 15.97 % for M = 100). The congeniality condition of Meng (1994) does not hold for the proportion based on the estimator (26). For more details, see Appendix C. Thus, the multiple imputation method in this simulation is congenial for θ 1 and θ 2, but it is not congenial for the proportion parameter θ 3. 6. CONCLUDING REMARKS Parametric fractional imputation is proposed as a tool for parameter estimation with missing data. The analyses using fractionally imputed data yij with weights w iwij can be performed using existing software for the complete sample treating the imputed values as if observed. Generating the imputed values and computing the fractional weights may require careful modeling and parameter estimation, but the users do not need to know these details for their analyses. The data provider, who has the best available information for the imputation model, can use the imputation model to construct the fractionally imputed data with fractional weights and possibly with replicated fractional weights for variance estimation. No other information is necessary for the users. If parametric fractional imputation is used to construct the score function, the solution to the imputed score equation is very close to the maximum likelihood estimator of the parameters in the model. Some computational efficiency relative to the Monte Carlo EM method can be achieved with parametric fractional imputation using the idea of importance sampling and calibration weighting. Parametric fractional imputation yields direct consistent estimates for pa-

12 J.K. KIM 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 Table 2. Monte Carlo relative biases and t-statistics of the variance estimators Parameter Method Relative Bias (%) t-statistics Linearize (for FI with M = 100) 3.93 1.23 var(ˆθ 1 ) Linearize (for FI with M = 10) 2.68 0.85 One-step JK (for CFI with M = 10) 5.88 1.84 MI (M=100) 2.83 0.90 MI (M=10) 2.93 0.93 Linearize (for FI with M = 100) -2.00-0.58 var(ˆθ 2 ) Linearize (for FI with M = 10) -7.85-2.29 One-step JK (for CFI with M = 10) 6.38 1.84 MI (M=100) -4.75-1.36 MI (M=10) -5.40-1.54 Linearize (for FI with M = 100) 3.10 0.96 var(ˆθ 3 ) Linearize (for FI with M = 10) 4.17 1.29 One-step JK (for CFI with M = 10) -1.14-0.35 MI (M=100) 15.97 4.94 MI (M=10) 15.53 4.82 FI, fractional imputation; CFI, calibration fractional imputation; MI, multiple imputation; JK, jackknife;. rameters that are not part of the imputation model. For example, in the simulation study, the parametric fractional imputation computed from a normal model provides direct estimates for the cumulative distribution function. Thus, the proposed imputation method is a useful tool for general-purpose data analysis, where the parameters of interest are unknown at the time of imputation. Variance estimation can be performed using a linearization method or a replication method. The validity of variance estimation for parametric fractional imputation, unlike multiple imputation, does not require the congeniality condition of Meng (1994). ACKNOWLEDGEMENT The author wishes to thank professor Wayne Fuller, the referees, and the Editor for their very helpful comments. Define a class of estimators η g0,n,k (θ) = APPENDIX A. Proof of Theorem 1 w i wij0 (θ) g ( yij ) + K T w i E {s (θ; y i ) y i,obs, θ indexed by K. Note that, by (5), we have n w ie{s(ˆθ; y i ) y i,obs, ˆθ = 0 and η g0,n,k (ˆθ) = ˆη g0,n,m for any K. According to Randles (1982), we have { n η g0,n,k (ˆθ) η g0,n,k (θ 0 ) = o p (1),

577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 if is satisfied. Using { w i θ w ij0 (θ) g ( yij ) n = Parametric fractional imputation 13 { E θ η g0,n,k (θ 0 ) = 0 (A.1) the choice of K = K 1 in (15) satisfies (A.1) and thus (13) follows. To show (14), consider Using (11), η g1,n,k (θ) = w i wij (θ) g ( yij ) n = + w i wij (θ) g ( yij ) + K T w i wij0 (θ) g ( yij ) w i wij0 (θ) { s ( θ; yij) s i (θ) g ( yij), w i E {s (θ; y i ) y i,obs, θ. w i E {s (θ; y i ) y i,obs, θ where ˆB g (θ) = w i wij0 (θ) ( s ij (θ) s i (θ) ) 2 1 n w i wij0 (θ) s ( θ; yij ) T ˆB g (θ), w i wij0 (θ) ( s ij (θ) s i (θ) ) g ( yij). After some algebra, it can be shown that the choice of K = K 1 in (15) also satisfies E { η g1,n,k (θ 0 ) / θ = 0 and, by Randles (1982) again, { n η g1,n,k (ˆθ) η g1,n,k (θ 0 ) = o p (1). B. Replication variance estimation Under complete response, let w [k] i be the k-th replication weight for unit i. Assume that the replication variance estimator ˆV n = L k=1 c k ( ˆη [k] g ˆη g ) 2, where ˆη g = n w ig (y i ) and ˆη g [k] = n w[k] i g (y i ), is consistent for the variance of ˆη g. For replication with the calibration fraction method computed from (10), we consider the following steps for creating replicated fractional weights. [Step 1] Compute ˆθ [k], the k-th replicate of ˆθ, using fractional weights. [Step 2] Using the ˆθ [k] computed from Step 1, compute the replicated fractional weights by using the regression weighting technique. w [k] i w [k] (ˆθ[k] ) ij s ; yij = 0, (B.1)

625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 14 J.K. KIM Equation (B.1) is the calibration equation for the replicated fractional weights. For any estimator of the form (7), the replication variance estimator is constructed as where ˆη [k] F I,g = n M j=1 w[k] i ˆV (ˆη F I,g ) = L k=1 ( ) 2 c k ˆη [k] F I,g ˆη F I,g w [k] ij g(yij ) and w [k] ij is computed from (B.1). In general, Step 1 can be computationally problematic since ˆθ [k] is often computed from the iterative algorithm (21) for each replication. Thus, we consider an approximation for ˆθ [k] using Taylor linearization of 0 = S [k] (ˆθ [k] ) around ˆθ. The one-step approximation can be implemented as ˆθ [k] = ˆθ [Î[k] (ˆθ)] 1 ) + obs S[k] (ˆθ, where Î [k] obs (θ) = n and S [k] (θ) = n w[k] i proposed by Louis (1982). { w [k] i wij θ s ( ) i θ; y ij + s i w [k] i { s i (θ) 2 w [k] i wij { ( ) s θ; y 2 ij (θ). The Î[k] obs (θ) is a replicated version of the observed information matrix C. A note on multiple imputation for a proportion Assume that we have a random sample of size n with observations (x i, y i ) obtained from a bivariate normal distribution. The parameter of interest is a proportion, for example, η = pr (y 0.8). An unbiased estimator of η is ˆη n = 1 I (y i 0.8). (C.1) n Note that ˆη n is unbiased but has bigger variance than the maximum likelihood estimator (27) of η. For simplicity, assume that the first r(< n) elements have both x i and y i responding, but the last n r elements have only x i observed and y i missing. In this situation, an efficient imputation method such as ( ) yi N ˆβ0 + x i ˆβ1, ˆσ e 2 (C.2) can be used, where ˆβ 0, ˆβ 1 and ˆσ e 2 can be computed from the respondents. In multiple imputation, the parameter estimates are generated from a posterior distribution given the observations. Under the imputation mechanism (C.2), the imputed estimator of µ 2 of the form ˆµ 2,I = n { 1 r y i + n i=r+1 y i satisfies var {ˆµ 2,F E = var (ȳ n ) + var (ˆµ 2,F E ȳ n ), where ˆµ 2,F E = E I (ˆµ 2,I ). Condition (C.3) is the congeniality condition of Meng (1994). Now, for η = pr (y 3), the imputed estimator of η based on ˆη n in (C.1) is { r ˆη I = 1 I (y i 3) + I (yi 3). (C.4) n i=r+1 The expected value of ˆη I over the imputation mechanism is E I (ˆη I ) = 1 r n { I (y i 3) + pr(y i 3 x i, ˆθ) = ˆη F E + 1 n i=r+1 r {I(y i 3) pr(y i 3 x i, ˆθ), (C.3)

673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 Parametric fractional imputation 15 where ˆη F E = n 1 n pr(y i 3 x i, ˆθ). For the proportion, ˆη F E E I (ˆη I ) and so the congeniality condition does not hold. In fact, var {E I (ˆη I ) < var (ˆη n ) + var {E I (ˆη I ) ˆη n and the multiple imputation variance estimator overestimate the variance of ˆη I in (C.4). Instead, we have var (ˆη F E ) = var (ˆη n ) + var (ˆη F E ˆη n ) and the multiple imputation variance estimator estimates the variance of ˆη F E for sufficiently large M. REFERENCES Booth, J.G. and Hobert, J.P. (1999). Maximizing generalized linear mixed models with an automated Monte Carlo EM algorithm, Journal of the Royal Statistical Society: Series B 61, 265-85. CLAYTON, D., SPIEGELHALTER, D., DUNN, G., & PICKLES, A. (1998). Analysis of longitudinal binary data from multiphase sampling. J. R. Statist. Soc. B 60, 71-87. DEMPSTER, A. P., LAIRD, N. M. & RUBIN, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. B 39 1-38. FAY, R. E. (1992). When are inferences from multiple imputation valid? In Proc. Survey Res. Meth. Sect., Am. Statist. Assoc., pp. 227-32. Washington, DC: American Statistical Association. GELMAN, A., MENG, X.-L., & STERN, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statist. Sci. 6, 733-807. Henmi, M., Yoshida, R., & Eguchi, S. (2007). Importance sampling via the estimated sampler. Biometrika 94, 985-91. IBRAHIM, J. G. (1990). Incomplete data in generalized linear models. J. Am. Statist. Assoc. 85, 765-69. KIM, J.K., BRICK, M.J., FULLER, W.A., & KALTON, G. (2006). On the bias of the multiple imputation variance estimator in survey sampling. J. R. Statist. Soc. B 68, 509-21. KIM, J.K. & FULLER, W.A. (2004). Fractional hot deck imputation, Biometrika 91, 559-78. KIM, J.K. & RAO, J.N.K. (2009). Unified approach to linearization variance estimation from survey data after imputation for item nonresponse, Biometrika, In press. (available through Biometrika Advance Access. doi:10.1093/biomet/asp041) LOUIS, T. A. (1982). Finding the observed information matrix when using the EM algorithm. J. R. Statist. Soc. B 44, 226-33. MENG, X.L.(1994). Multiple-imputation inferences with uncongenial sources of input (with discussion), Statist. Sci. 9, 538-73. RANDLES, R. H. (1982). On the asymptotic normality of statistics with estimated parameters. Ann. Statist. 10, 462-74. ROBINS, J.M. & WANG, N. (2000). Inference for imputation estimators, Biometrika 87, 113-24. RUBIN, D. B. (1976). Inference and missing data. Biometrika 63, 581-90. RUBIN, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. SUNG, Y.J. & GEYER, C.J. (2007). Monte Carlo likelihood inference for missing data models. Ann. Statist. 35, 990-1011. PEPE, M.S. (1992). Inference using surrogate outcome data and a validation sample. Biometrika 79, 355-65. WANG, N. & ROBINS, J.M. (1998). Large-sample theory for parametric multiple imputation procedures. Biometrika 85, 935-48. WEI, G.C.G. & TANNER, M.A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man s data augmentation algorithm. J. Am. Statist. Assoc. 85, 699-704. WU, C.F.J. (1983). On the convergence properties of the EM algorithm, Ann. Statist. 11, 95-103. [Received January????. Revised July????]