Parametric fractional imputation for missing data analysis

Size: px

Start display at page:

Download "Parametric fractional imputation for missing data analysis"

Rudolph Butler
5 years ago
Views:

1 Secton on Survey Research Methods JSM 2008 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Wayne Fuller Abstract Under a parametrc model for mssng data, the EM algorthm s a popular tool for fndng the maxmum lkelhood estmates MLE of the parameters of the model. Imputaton, when carefully done, can be used to facltate the parameter estmaton by applyng the complete-sample estmators to the mputed dataset. The basc dea s to generate the mputed values from the condtonal dstrbuton of the mssng data gven the observed data. Multple mputaton s a Bayesan approach to generate the mputed values from the condtonal dstrbuton. In ths artcle, parametrc fractonal mputaton s proposed as a parametrc approach for generatng mputed values. Usng fractonal weghts, the E-step of the EM algorthm can be approxmated by the weghted mean of the mputed data lkelhood the fractonal weghts are computed from the current value of the parameter estmates. Some computatonal effcency can be acheved usng the dea of mportance samplng n the Monte Carlo approxmaton of the condtonal expectaton. The resultng estmator of the specfed parameters wll be dentcal to the MLE under mssng data f the fractonal weghts are adjusted usng a calbraton step. The proposed mputaton method provdes effcent parameter estmates for the model parameters specfed and also provdes reasonable estmates for parameters that are not part of the mputaton model, for example doman means. Thus, the proposed mputaton method s a useful tool for general-purpose data analyss. Varance estmaton s covered and results from a lmted smulaton study are presented. Key Words: nformaton. EM algorthm, Importance samplng, Monte Carlo EM, Multple mputaton, Observed lkelhood, Observed 1. INTRODUCTION Suppose that y 1, y 2,, y n are ndependent observatons of a p-dmensonal random varable y from a parametrc dstrbuton wth densty f y; θ 0 wth θ 0 Ω. The MLE of θ 0 can be obtaned as a soluton to the followng score equaton: S n θ s θ = 0, 1 s θ = ln f y ; θ / θ and S n θ s the score functon. Gven mssng data, let y,obs, y denote the observed part and mssng part of y, respectvely. To smplfy the presentaton, we assume the response mechansm s Mssng-At-Random MAR n the sense of Rubn Under MAR, the lkelhood functon s a margnal lkelhood obtaned by ntegratng out over the mssng part. Thus, we can wrte the observed lkelhood as L obs θ = n f obs y obs, ; θ, 2 f obs y obs, ; θ = f y,obs, y ; θ dy s the margnal densty of y,obs and the subscrpt s used n f obs because the mssng pattern can dffer from observaton to observaton. To compute the MLE that maxmzes the observed lkelhood 2, we need to solve the observed score equaton for θ, the observed score equaton s S obs θ s,obs θ θ Instead of solvng 3, the MLE of θ 0 can be obtaned by solvng S θ E S n θ Y obs ln f obs y obs, ; θ = 0. 3 E s θ y,obs = 0, 4 Y obs = y 1,obs, y 2,obs,, y n,obs, and S θ s called the mean score functon. The equvalence of the observed score functon and the mean score functon was frst proved by Fsher Department of Statstcs, Iowa State Unversty, Ames, IA 50011, U.S.A. 158

2 Strctly speakng, the condtonal expectaton n 4 s evaluated at θ and we should wrte the mean score equaton as S θ E s θ y,obs, θ = 0. 5 The EM algorthm, proposed by Dempster el al 1977, computes the soluton teratvely by defnng ˆθ t+1 to be the soluton to E s θ y,obs, ˆθ t = 0, 6 ˆθ t s the estmate of θ obtaned at the t-th teraton. To compute the condtonal expectaton n 6, the Monte Carlo mplementaton of the EM MCEM algorthm of We and Tanner 1990 can be used. The MCEM method avods the analytc computaton of the condtonal expectaton 6 by usng the Monte Carlo approxmaton based on the mputed data. Thus, one can nterpret mputaton as a Monte Carlo approxmaton of the condtonal expectaton gven the observed data. The Monte Carlo methods of approxmatng the condtonal expectaton n 4 can be placed n two classes: 1. Bayesan approach: Generate the mputed values from the posteror predctve dstrbuton of y gven y,obs : f y y,obs = f y θ, y,obs f θ y,obs dθ. 7 Ths s essentally the approach used n multple mputaton as proposed by Rubn Frequentst approach: Generate the mputed values from the condtonal dstrbuton f y y,obs, ˆθ wth an estmated value ˆθ. The Bayesan approach to mputaton has been proposed as a general method of handlng mssng data because of the feasblty of Bayesan computatonal methods and the smplcty of varance estmaton. However, the convergence to a stable posteror predctve dstrbuton 7 s dffcult to check and often requres huge computaton Gelman et al, Also, the varance estmator used n multple mputaton s not always consstent. For examples, see Fay 1992, Wang and Robns 1998, and Km et al In the frequentst approach to mputaton, the mputed values are generated from the condtonal dstrbuton f y y,obs, ˆθ wth a partcular value ˆθ, often the MLE of θ. However, the frequentst approach for mputaton has receved less attenton than Bayesan mputaton. One notable excepton s Wang and Robns 1998 who studed the asymptotc propertes of multple mputaton and a parametrc frequentst mputaton procedure. Wang and Robns 1998 consdered the estmated parameter ˆθ to be gven, and dd not dscuss parameter estmaton. We consder a frequentst mputaton gven a parametrc model for the orgnal dstrbuton. We propose an alternatve mplementaton of the MCEM method usng parametrc fractonal mputaton that does not requre regeneraton of the mputed values at each teraton. Only the fractonal weghts are re-computed for each teraton and we propose a smple method of computng the fractonal weghts wthout ncreasng the sze of Monte Carlo samples. The proposed method uses the calbraton technque to obtan the MLE and s computatonally very attractve n many cases. In Secton 2, the parametrc fractonal mputaton method s proposed. Varance estmaton s dscussed n Secton 3 and the proposed method s extended for general purpose estmaton n Secton 4. Calbraton fractonal mputaton s derved n Secton 5. Results from a lmted smulaton study are presented n Secton Proposed method As dscussed n Secton 1, solvng the mean score equaton 4 requres an teratve method because the condtonal dstrbuton of y gven y,obs, denoted by f y y,obs, θ, s a functon of θ. Thus, snce we cannot generate mputed values from the condtonal dstrbuton wth unknown θ, the teratve procedure generates mputed values from the condtonal dstrbuton wth the current value of θ and then updates θ based on the mputed score equaton. To avod re-generatng values from the condtonal dstrbuton at each step, we frst generate M mputed values from some known dstrbuton q y whose support ncludes that of f y y,obs, θ. Let the generated values be y 1,, y M. Because E s θ y,obs, ˆθ t = Secton on Survey Research Methods JSM 2008 f y y,obs, ˆθ t s θ q y q y dy, 8 159

3 we can approxmate the condtonal expectaton by E s θ y,obs, ˆθ.= 1 t M f s j θ y j y,obs, ˆθ t. q y j Thus, we propose the followng algorthm for the parametrc fractonal mputaton usng mportance samplng: [Step 1] Obtan an ntal estmator ˆθ 0 of θ. Also, generate M mputed values, y 1,, y M, from some densty q y. Often, q y = f y y,obs, ˆθ 0. [Step 2] Wth the current estmate of θ, denoted by ˆθ t, compute the fractonal weghts as w jt = C t C t s chosen to satsfy M w jt = 1. Secton on Survey Research Methods JSM 2008 f y j y,obs; ˆθ t, 9 q y j [Step 3] Usng the fractonal weght obtaned from Step 2, solve the weghted score equaton ˆθ t+1 soluton to w jt s j θ = [Step 4] Go to Step 2. Stop f ˆθ t meets the convergence crteron. The proposed method s computatonally attractve because we use a weghted score equaton to compute the parameter estmates. Unlke the MCEM method, the mputed values are not changed for each teraton, only the fractonal weghts are changed. Remark 1 In Step 2, fractonal weghts can be computed by usng the jont densty wth the current parameter estmate ˆθ t. Note that f y j y,obs, ˆθ t /q M y f j y,obs, ˆθ t /q y j y j = f y,obs, y j ; ˆθ t /q M y f,obs, y j ; ˆθ t /q Thus, the fractonal weghts 9 can be computed as f y,obs, y j wjt = C ; ˆθ t t, q y j y j y k whch does not requre the densty of the condtonal dstrbuton. Only the jont densty s needed. Remark 2 The choce of the ntal densty q y s somewhat arbtrary. If we choose q y = f y y,obs, ˆθ 0 ˆθ 0 s an ntal parameter estmate of θ, the fractonal weght wth current parameter estmate ˆθ t s of the form f y,obs, y j wjt = C ; ˆθ t t f y,obs, y j ; ˆθ, 11 0 C t s a normalzng constant. The ntal estmate ˆθ 0 s not necessarly n-consstent. Gven the M mputed values, y 1,, y M, generated from q y, the sequence of estmators ˆθ0, ˆθ 1, can be constructed from the parametrc fractonal mputaton usng mportance samplng. The followng theorem presents some convergence propertes of the sequence of the estmators.. 160

4 Secton on Survey Research Methods JSM 2008 Theorem 1 Assume that the M mputed values are generated from q y. Let w jt = w j then ˆθt. If Q θ ˆθ t = w jt ln f y,obs, y j ; θ, 12 Q ˆθt+1 ˆθ t Q ˆθt ˆθ t L obs ˆθt+1 L obs 13 ˆθt, 14 L obs θ = n f obs y,obs; θ wth M fobs y y f,obs, y j ; θ /q y j,obs; θ =. M y 1/q j Proof. By the Jensen s nequalty, ln L obs ˆθt+1 ln L obs ˆθt f y,obs, y j = ln wjt ; ˆθ t+1 f y,obs, y j ; ˆθ t f y,obs, y j ln wjt ; ˆθ t+1 f y,obs, y j ; ˆθ t = Q ˆθt+1 ˆθ t Q ˆθt ˆθ t. Therefore, 13 mples 14. Note that L obs θ s an mputed verson of the observed lkelhood based on the the M mputed values, y 1,, y M, generated from q y. Under farly general condtons, the soluton to the mputed score equaton 10 satsfes 13. Thus, by Theorem 1, the sequence L obs ˆθt s monotoncally ncreasng. Also, under the farly general condtons stated n Wu 1983, the convergence of ˆθ t follows for fxed M. Theorem 1 does not hold for the sequence obtaned from the MCEM method for fxed M. To dscuss varance estmaton, note that 3. Varance estmaton θ S θ = I obs θ, 15 I obs θ = E θ S n θ Y obs, θ + S θ 2 E S n θ 2 Y obs, θ wth S n θ = n s θ and S 2 = SS. Lous 1982 frst proved 15 to estmate the varance of the MLE obtaned by the EM algorthm. Let ˆθ be the soluton to the approxmate mean score equaton s j θ = s θ; y,obs, y j and w j θ = S θ 16 w j θ s j θ = 0, 17 f y,obs, y j ; θ /q M k=1 y f,obs, y k ; θ /q y j y k

5 Note that Secton on Survey Research Methods JSM 2008 E S θ Y obs = S θ 19 S θ s defned n 5 and the expectaton n 19 s over the mputaton mechansm. Here, superscrpt s used n ˆθ to emphasze that the soluton s obtaned from the approxmate mean score equaton 17, not from the exact mean score equaton 5. An EM-type algorthm such as 10 can be used to fnd a soluton ˆθ to 17. Usng the Taylor lnearzaton, ] 1 ˆθ θ 0 = [E θ S θ 0 S θ 0. Thus, we can use the sandwch formula to compute the varance of ˆθ that s the soluton to 4. Note that, by 19, V ar S θ 0 = V ar S θ0 + V ar S θ 0 S θ The frst term n the rght sde of 20 can be estmated by I obs ˆθ 1, as suggested by Lous The observed nformaton 16 can be easly computed from fractonal mputaton. That s, we use Îobsˆθ an an estmator of I obs θ 0, Î obs θ = + w j s θ 2 s θ; y,obs, y j / θ w j s j 2 θ 21 s θ = M w j s j θ and wj = w j ˆθ. Thus, the estmator n 21 s based on the Monte Carlo approxmaton of the condtonal expectaton 16 usng fractonal mputaton the fractonal weght corresponds to the mportance weght of mportance samplng. Because the Monte Carlo expectaton not only approxmates the mean score equaton 5 but also approxmates the observed nformaton 16, the fractonal mputaton FI method provdes consstent varance estmaton for suffcently large M. To estmate the second term of 20, we consder the case when y 1,, y M are ndependent samples from q y. In ths case, we can express S θ = 1 S j θ M S j θ = M n w j s j θ and we have 1 M B θ = 1 1 M M 1 S j θ S 2 θ to be unbased for second term of 20. Therefore, the proposed varance estmator s ˆV ˆθ = [I obs ˆθ ] 1 [ + I obs ˆθ ] 1 1 M Bˆθ [I obs ˆθ ] Often, the second term n 20 s very small for large M or for an effcent mputaton method. In ths case, the second term 22 can be safely omtted n the varance estmaton. 4. Extensons So far, we have consdered the case the parameter of nterest s estmated by the maxmum lkelhood method. We consder an extenson the parameter of nterest s not necessarly estmated from the maxmum lkelhood method, but s estmated by solvng an estmatng equaton. Suppose that, under complete response, a parameter of nterest, denoted by η, s estmated as the unque soluton to the estmatng equaton U η u η; y = 0,

6 Secton on Survey Research Methods JSM 2008 for some functon u η; y of η wth contnuous partal dervatves. Let ˆη be the soluton to 23. Under some regularty condtons, n ˆη η0 N [0, g η 0 1 V u η 0 ; y g η 0 1] g η = E u η; y / η and η 0 s a unque soluton to E U η = 0. Under nonresponse, a consstent estmator of η 0 can be obtaned as a soluton to the followng estmatng equaton Ū η ˆθ E u η; y y,obs, ˆθ = 0, 24 ˆθ s the soluton to 5. The estmatng equaton 24 s called the expected estmatng equaton. The use of an expected estmatng equaton has been dscussed by, among others, Wang and Pepe 2000 and Robns and Wang Usng the fractonal mputaton approach dscussed n Secton 2, we can construct a Monte Carlo approxmaton to the estmatng equaton ˆη soluton to w jˆθ u j η = 0, 25 u j η = uη; y,obs, y j, w j θ s defned n 18, and ˆθ s the soluton to 17. Note that we do not have to update the soluton ˆθ teratvely n 25 and only the fnal estmate ˆθ s needed. The followng theorem presents some asymptotc propertes of the estmator that s the soluton to 24, or the soluton to 25. Theorem 2 Let ˆθ be the Monte Carlo approxmaton of the MLE of θ that s computed by solvng the approxmated mean score equaton 17. Under some regularty condtons, the soluton ˆη to 25 satsfes n ˆη η = o p 1 26 and E η = η 0 V ar η = g η 0 1 V ar Ũ η 0, θ 0 g η Here, g η = E n u η; y / η and Ũ η, θ = Ū η, θ + K S θ, 28 Ū η, θ = S θ = wj θ u j η wj θ s j θ and K = [I obs θ 0 ] 1 E [S ms θ 0 U η 0 ]. 29 Here, I obs θ = E S obs θ / θ and S ms θ = S n θ S obs θ. The result n Theorem 2 can be used to derve a varance estmator for ˆη that s a soluton to 25. The crucal part s to estmate the varance of the lnearzed term 28. Note that we can wrte Ũ V ar η 0, θ 0 = V ar Ũ Ũ η0, θ 0 + V ar η 0, θ 0 Ũ η 0, θ 0, 30 Ũ η, θ = p lm Ũ η, θ M 163

7 If we wrte Ũ η, θ = Ū η, θ K S θ = a plug-n estmator of V ar Ũ η0, θ 0 s Secton on Survey Research Methods JSM 2008 n n 1 n ū η, θ K s θ = û û û û ũ, û = ū ˆη, ˆθ ˆK s ˆθ. The terms ū ˆη, ˆθ and s ˆθ are easly computed from the fractonal mputaton wth fractonal weghts. To estmate the second term of 30, wrte Ũ η, θ = 1 Ũ j η, θ, M. The second term n 30 can be consstently est- Ũ j η, θ = M n w j θ mated by u j 1 1 M M 1 η K s j θ j Ũ ˆη, ˆθ ˆη, ˆθ 2 Ũ. To estmate K term n 29, we need to estmate the two terms n 29 separately. The frst term, I obs θ, can be computed usng 21, the estmated observed nformaton based on the Lous formula. Now, to estmate the second term n K, we use E U η, θ S ms θ Y obs, ˆθ = E U η, θ S n θ Y obs, ˆθ Ū η, θ S θ. The frst expectaton can be estmated by the fractonal mputaton. That s, we can estmate E U η, θ S n θ Y obs, ˆθ by wju j j ˆη, ˆθs ˆθ wth u j η, θ = u η, θ; y,obs, y j and s j θ = s θ; y,obs, y j. 5. Calbraton The proposed estmaton method can be vewed as a method of mplementng a MCEM algorthm usng mportance samplng. The MCEM method s subject to samplng error when approxmatng the condtonal expectaton by a summaton. In general, the sze M of the Monte Carlo sample needs to be very large for satsfactory approxmaton. For moderate sze M, there are two stuatons when the approxmaton s accurate. The frst stuaton s when there are only fnte number of possble values for y. In ths case, we take the possble values as the mputed values and compute the condtonal probablty of y by the followng Bayes formula: p y j y,obs, ˆθ = y,obs, y ; ˆθ f M f y,obs, y j ; ˆθ M s the number of possble values of y and ˆθ s the MLE of θ. The condtonal expectaton n 6 can be wrtten E s θ y,obs, ˆθ M t = s j θ p y j y,obs, ˆθ t. 31, 164

8 Here, the estmated probablty p y j y,obs, ˆθ t takes the role of the fractonal weght. Ibrahm 1990 proposed usng 31 n the E-step of the EM algorthm for dscrete data. The approxmaton s exact when the dstrbuton belongs to the exponental famly of the form f y; θ = exp t y θ + φ θ + A y. 32 Under the model 32, the score equaton 1 under complete response s equal to φ θ t y + = 0 θ and the mean score equaton 4 can be wrtten E [t y y,obs, θ] + φ θ = 0. θ Thus, the ntegraton problem n 6 reduces to the problem of computng the ntegraton E t y y,obs, θ, whch s often a known functon of y,obs and θ. In ths case, the mplementaton of the EM algorthm smplfes. Defne g y,obs, θ = E t y y,obs, θ. 33 Recall that, n the fractonal mputaton approach, we can express the condtonal expectaton by a weghted summaton E t y y,obs, ˆθ M t = wjt t y,obs, y j, 34 y j s the j-th mputed value of y and wjt s the fractonal weght whch s the condtonal probablty of y = y j ms, gven y obs, usng the current parameter value ˆθ t. Thus, t s proposed that M wjt t y,obs, y j = g y,obs, ˆθ t be used as as a constrant for fndng the fractonal weghts. We can use the regresson weghtng technque or the emprcal lkelhood technque to fnd a soluton to 35. Here, M need not be large. Example 1 Suppose that y = y 1, y 2 has a bvarate normal dstrbuton: [ ] y1..d. µ1 σ11 σ N, 12. y 2 µ 2 σ 12 σ 22 Under the bvarate normal dstrbuton, a set of suffcent statstcs for the parameter θ = µ 1, µ 2, σ 11,, σ 12, σ 22 s y1, y 2, y1 2, y 1y 2, y2 2. Therefore, constrant 35 can be satsfes f n Secton on Survey Research Methods JSM and = = 1, E 1, E w jt 1, y j 1, y j 1 2 y 1 y 2, ˆθ t, E y 1 y 2, ˆθ 2 t + ˆσ11 2t, for A MR w jt 1, y j 2, y j 2 2 y 2 y 1, ˆθ t, E y 2 y 1, ˆθ 2 t + ˆσ22 1t, for A RM σ 11 2 = σ 11 σ 2 12/σ 22, and σ 22 1 = σ 22 σ 2 12/σ 11. E y 1 y 2, ˆθ = ˆµ 1 + ˆσ 12 y 2 ˆµ 2 ˆσ 22 E y 2 y 1, ˆθ = ˆµ 2 + ˆσ 12 y 1 ˆµ 1, ˆσ

9 In practce, nstead of 35, the fractonal weghts are computed from A c wjt t y,obs, y j = g y,obs, ˆθ t, 36 A c A c s the set of sample ndces n a cell c. Imposng fractonal weghtng constrants n each cell rather than for each unt reduces the chance of extreme weghts. Varance estmaton wth fractonally mputed data can be performed usng lnearzaton or replcaton. The plug-n method dscussed n Secton 3 s essentally the lnearzaton method. Assume that, under complete response, let be the k-th replcaton weght for unt. Assume that the replcaton varance estmator ˆθ [k] n ˆV n = L k=1 c k ˆθ[k] n ˆθ n 2, 37 ˆθ n = n w y and = n w[k] y, s consstent for the varance of ˆθ n. For replcaton wth the calbraton fractonal mputaton method, we consder the followng steps for creatng replcated fractonal weghts. Here, we assume that the calbraton fractonal weghts are computed from 36. [Step 1] Compute ˆθ [k], the k-th replcate of ˆθ, usng fractonal weghts. [Step 2] Usng the ˆθ [k] computed from Step 1, compute the replcated fractonal weghts by A c usng the regresson weghtng technque. w [k] j t y,obs, y j = g y,obs, ˆθ [k], 38 A c Equaton 38 s the calbraton equaton for the replcated fractonal weghts. In general, Step 1 can be computatonally problematc snce ˆθ [k] s computed from the teratve algorthm 10 for each replcaton. Thus, we consder an approxmaton for ˆθ [k] usng Taylor lnearzaton. Let Secton on Survey Research Methods JSM 2008 S [k] θ = s θ s θ = E s θ y,obs, θ. Usng 15 and 21, the approxmaton formula can be mplemented as ˆθ [k] = ˆθ [Î[k] ˆθ] 1 + obs S[k] ˆθ, 39 and Î [k] obs θ = n + w j s θ 2 S [k] θ = s θ; y,obs, y j / θ w j w js j θ. 6. Smulaton Study s j 2 θ 40 In a lmted smulaton study, we generated B = 5, 000 Monte Carlo samples of sze n = 200 from a bvarate normal dstrbuton wth µ 1 = 0, µ 2 = 2, σ 11 = 1, σ 12 = 1, and σ 22 = 2. The probablty of both respondng s 0.42, the probablty of only y 1 respondng 0.18, and the probablty of only y 2 respondng We consdered the followng seven parameters: 166

10 1. Fve parameters n the bvarate normal dstrbuton: 2. Proporton of y 1 less than 0.8. µ 1, µ 2, σ 11, σ 12, σ Doman mean the probablty of beng n the doman s 0.4. The probablty of beng n the doman does not depend on y 1 or y 2. For each parameter, we have computed four estmators: 1. The MLE usng the EM algorthm 2. The fractonal mputaton estmator proposed n Secton 2 wth M = 100 and M = The calbraton fractonal mputaton estmator proposed n Secton 5 wth M = 10 usng the regresson weghtng method. 4. Multple mputaton MI wth M = 10 mputatons. Secton on Survey Research Methods JSM 2008 In fractonal mputaton, mputed values are generated by a systematc samplng method descrbed n Appendx B, wth M = 1, 000. The basc dea s to generate M ntal mputed values and then use a verson of systematc samplng to get the fnal M mputed values. In the calbraton fractonal mputaton method, the regresson fractonal weghts are computed by 35. In multple mputaton, the mputed values are generated from the posteror predctve dstrbuton teratvely usng Gbbs samplng. For varance estmaton, we consdered the FI estmator wthout calbraton, the calbraton FI estmator, and multple mputaton. For varance estmaton of the fractonal mputaton, we used the plug-n estmator dscussed n Secton 3 and Secton 4. For varance estmaton of the calbraton FI estmator, we used the one-step jackknfe varance estmator dscussed n Secton 5. For varance estmaton of the multple mputaton, we used the varance formula of Rubn Table 1 presents the Monte Carlo means and varances of the four estmators. Table 2 presents the Monte Carlo relatve bases and t-statstcs for the varance estmators. The t-statstc s the statstc for testng zero bas n the varance estmator. For pont estmaton, the calbraton FI estmator and the the EM method gve the same values for the parameters specfed n the model. The uncalbrated fractonal mputaton estmator shows farly good effcency for many parameters, whch suggests that the systematc samplng method used n the fractonal mputaton s already qute effcent. Multple mputaton shows less effcency than the FI estmators for all parameters. For estmaton of the proporton and the doman mean, t s possble for the FI estmator wth M = 100 to be more effcent than the calbraton FI estmator wth M = 10 because these parameters are not drectly consdered n the calbraton step. The dfferences n effcences for these two parameters are less than one percent. For varance estmaton of the FI estmators, both lnearzaton and replcaton methods provde consstent estmates for the varance of the parameter estmates. Varance estmaton for doman estmaton s based under multple mputaton, as was dentfed by Km and Fuller REFERENCES Dempster, A. P., Lard, N. M. and Rubn, D. B. 1977, Maxmum lkelhood from ncomplete data va the EM algorthm, Journal of the Royal Statstcal Socety, Ser. B, 39, Fay, R. E. 1992, When are nferences from multple mputaton vald? In Proceedngs n Survey Research Method Secton, Washngton, DC: Amercan Statstcal Assocaton, pp Fsher, R.A. 1925, Theory of statstcal estmaton, Proceedngs of the Cambrdge Phlosophcal Socety, 22, Gelman, A., Meng, X.-L., and Stern, H. 1996, Posteror predctve assessment of model ftness va realzed dscrepances wth dscusson, Statstca Snca, 6, Ibrahm, J. G. 1990, Incomplete data n generalzed lnear models, Journal of the Amercan Statstcal Assocaton, 85, Km, J.K., Brck, M.J., Fuller, W.A., and Kalton, G. 2006, On the bas of the multple mputaton varance estmator n survey samplng, Journal of the Royal Statstcal Socety, Ser. B, 68, Km, J.K. and Fuller, W.A Fractonal hot deck mputaton, Bometrka, 91, Lous, T. A. 1982, Fndng the observed nformaton matrx when usng the EM algorthm, Journal of the Royal Statstcal Socety, Ser. B, 44, Robns, J.M. and Wang, N. 2000, Inference for mputaton estmators, Bometrka, 87, Rubn, D. B. 1976, Inference and mssng data, Bometrka, 63, Rubn, D.B. 1987, Multple mputaton for nonresponse n surveys, New York: Wley. Wang, C.-Y. and Pepe, M. S. 2000, Expected estmatng equatons to accommodate covarate measurement error, Journal of the Royal Statstcal Socety, Ser. B, 62, Wang, N. and Robns, J.M. 1998, Large-sample theory for parametrc multple mputaton procedures, Bometrka, 85, We, G.C.G. and Tanner, M.A. 1990, A Monte Carlo mplementaton of the EM algorthm and the poor man s data augmentaton algorthm, Journal of the Amercan Statstcal Assocaton, 85, Wu, C.F.J. 1983, On the convergence propertes of the EM algorthm, The Annals of Statstcs, 11,

11 Secton on Survey Research Methods JSM 2008 Table 1: Monte Carlo means and varances of the mputed estmators, based on 5,000 samples Parameter Method Mean Varance µ 1 EM FI M= FI M= Calb. FI M= MI M= µ 2 EM FI M= FI M= Calb. FI M= MI M= σ 11 EM FI M= FI M= Calb. FI M= MI M= σ 12 EM FI M= FI M= Calb. FI M= MI M= σ 22 EM FI M= FI M= Calb. FI M= MI M= Proporton FI M= FI M= Calb. FI M= MI M= Doman Mean FI M= FI M= Calb. FI M= MI M=

12 Secton on Survey Research Methods JSM 2008 Table 2: Monte Carlo relatve bases and t-statstcs of the varance estmators, based on 5,000 samples Parameter Method Rel. Bas % t-statstcs Lnearze for FI wth M = V ar ˆµ 1 Lnearze for FI wth M = One-step JK for calbraton FI MI M= Lnearze for FI wth M = V ar ˆµ 2 Lnearze for FI wth M = One-step JK for calbraton FI MI M= Lnearze for FI wth M = V ar ˆσ 11 Lnearze for FI wth M = One-step JK for calbraton FI MI M= Lnearze for FI wth M = V ar ˆσ 12 Lnearze for FI wth M = One-step JK for calbraton FI MI M= Lnearze for FI wth M = V ar ˆσ 22 Lnearze for FI wth M = One-step JK for calbraton FI MI M= Lnearze for FI wth M = V ar ˆp Lnearze for FI wth M = One-step JK for calbraton FI MI M= Lnearze for FI wth M = V ar ˆµ d Lnearze for FI wth M = One-step JK for calbraton FI MI M=

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

$Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010$ Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton