Empirical likelihood for parametric model under imputation for missing

Emirical likelihood for arametric model uder imutatio for missig data Lichu Wag Ceter for Statistics Limburgs Uiversitair Cetrum Uiversitaire Camus B-3590 Dieebeek Belgium Qihua Wag Istitute of Alied Mathematics AMSS Chiese Academy of Scieces Beijig 00080 Chia Abstract I the reset aer, we study the emirical likelihood method for a arametric model which arameterizes the coditioal desity of a resose give covariate. It is show the adjusted emirical log-likelihood ratio is asymtotically stadard χ 2 whe missig resoses are imuted usig maximum likelihood estimate. Keywords : Cofidece regio, emirical likelihood, maximum likelihood estimate.. Itroductio It is well kow that the emirical likelihood itroduced by Owe [, 2] is a oarametric techique which is very useful for costructig cofidece regios or itervals. It has may advatages over some moder E-mail: lichu.wag@luc.ac.be Joural of Statistics & Maagemet Systems Vol. 9 (2006), No.,. 3 c Taru Publicatios

2 L. WANG AND Q. WANG ad classical methods such as the bootstra method ad the ormalaroximatio-based method. I articular, it does ot imose rior costraits o regio shae ad does ot require a ivotal quatity, moreover, it is rage-reservig ad trasformatio resectig. Also, there is a excellet exositio of the emirical likelihood i Hall ad La Scala [6], some related works ca be foud i [2], [4], [0, ] ad [7, 8, 9], amog others. Whe cofroted with the iferece o the mea of a resose variable, some resoses may be missig for various reasos such as stated i the first aragrah of [8]. I fact, icomlete data are likely to be obtai i may situatios such as oiio olls, medical studies ad so o. Let X be a d-dimesio vector of factors ad Y be a resose variable iflueced by X. I ractice, oe ofte obtais a radom samle of icomlete data as (X i, Y i, δ i ), i =,...,, (.) where all X i are observed ad δ i = 0 if Y i is missig, otherwise δ i =. I such occasio, oe usual method is to dro X i with the missig resose from the cosideratio, aturally, this ca result i a serious loss of efficiecy whe a substatial roortio of X i is missig. Aother techique is to imute a value for the missig Y i so as to obtai a comlete data ad rocess the statistical aalysis. There are may imutatio methods for missig resoses such as kerel regressio imutatio used by [5] ad [8], liear regressio imutatio adoted by [7] ad ratio imutatio aeared i [3] ad so forth. I this aer, we cosider a arametric model which arameterizes the coditioal desity of Y give X, f (y x,θ), where θ is a -dimesio vector of arameter. Similar to [7, 8], throughout this aer, we assume that Y is missig at radom (MAR), that is, P(δ = Y, X) = P(δ = X). Usig the icomlete data (X i, Y i, δ i ), i, we imute the missig resose by maximum likelihood estimate ad develo a adjusted emirical likelihood to make iferece o the mea of Y. The rest of this aer is orgaized as follows. I Sectio 2, we defie a adjusted emirical log-likelihood ratio for the mea of resose, say µ, ad rove its asymtotic distributio. Some simulatio results will be exhibited i Sectio 3 to comare the emirical likelihood method with ormal method. For coveiece, the roofs of the mai results will be coveyed to Sectio 4.

EMPIRICAL LIKELIHOOD 3 2. A adjusted emirical log-likelihood Let m(x,θ) = E(Y X = x). Notice that E(m(X i,θ)) = E(Y i ), the we ca imute Y i by m(x i,θ) ad estimate µ by µ = i Y i + ( δ i )m(x i,θ)] = [δ Ŷ i (2.) whe Y i is missig. Hece, uder MAR, E(Ŷi) = µ if µ is the true arameter. Thus, the roblem of testig whether µ is the true arameter is equivalet to testig whether E(Ŷi) = µ for i =,...,. This motivates us to defie the emirical log-likelihood ratio ([2]) as l(µ) = 2 max log( i ), (2.2) where the maximum is take over all sets of oegative umbers,..., satisfyig i = ad i Ŷ i = µ. Clearly, l(µ) cotais ot oly µ but also the ukow arameter θ, hece, it caot be alied directly to iferece for µ. A atural way to solve this roblem is to estimate θ i l(µ) by (X i, Y i, δ i ), i. I this aer, we assume that there exists a maximum likelihood estimate (MLE) of the arameter θ. The, we ca relace θ i l(µ) by the MLE θ which satisfies the followig equatio log(l(θ)) where L(θ) = f δ i(y i X i,θ). The, the estimated emirical loglikelihood ratio is defied by l (µ) = 2 = 0, (2.3) max i =, i Ŷ i =µ log( i ), (2.4) where Ȳ i = δ i Y i + ( δ i )m(x i, θ ). Usig the Lagrage multilier method, whe mi Ȳ i < µ < max Ȳ i, we ca easily get i i i = [ + λ(ȳ i µ)], i =,...,, where λ is the solutio of Ȳ i µ + λ(ȳ i µ) = 0. (2.5)

4 L. WANG AND Q. WANG Hece, we have l (µ) = 2 log[ + λ(ȳ i µ)] (2.6) with λ beig the solutio of the equatio (2.5). Sice the Ȳ i s i (2.6) are ot i.i.d., as a result, l (µ) is asymtotically a o-stadard chi-square variable. I fact, it ca be show that l (µ), multilied by some oulatio quatity, follows a chi-quare distributio with oe degree of freedom. I other words, r(µ) l (µ) χ 2 asymtotically. Hece, i order to use this result to costruct a cofidece iterval for the mea µ, oe has to estimate the coefficiet r(µ). where Defie a adjusted emirical log-likelihood ratio as l,ad (µ) = r(µ) l (µ), (2.7) r(µ) = Ŝ2(µ) Ŝ (µ), with ad where Ŝ 2 (µ) = (Ȳ i µ) 2 (2.8) Ŝ (µ) = i [Y i m(x i, θ )]} {δ 2 + [ T + ( δ i )m () (X i, θ )] Γ Γ = { [ ( δ i )m () (X i, θ ) δ i log f (Y i X i,θ) ] [m(x i, θ ) µ] 2, (2.9) log f (Y i X i,θ) θ= θ T θ= θ } (2.0) ad m () (x,θ) deotes the first order artial derivative with resect to θ. I what follows, we shall establish a theorem for the adjusted emirical log-likelihood defied i (2.7), which is a oarametric versio

EMPIRICAL LIKELIHOOD 5 of Wilks s theorem. Before statig the theorem, we first make the followig assumtios. (A) E(Y 2 ) <. (A2) E m () (X,θ) 2 <. (A3) The first order derivative with resect to θ of the left side of y f (y x,θ)dy = m(x,θ) ca be obtaied by differetiatig uder the itegral sig. (A4) Ay elemet of the matrix m (2) (X,θ) has fiite secod order momet, where m (2) (x,θ) deotes the secod artial derivative with resect to θ. (A5) Ay elemet of the matrix 2 log f (Y X,θ)/ T has fiite secod order momet. Now, we have the followig mai result, its roof will be give i Sectio 4. Theorem 2.. Uder the assumtios (A)-(A5), if µ is the true arameter, we have l,ad (µ) L χ 2. Hece, a simle aroach to costruct a α -level cofidece iterval for the mea µ, based o Theorem 2., is I α = {µ : l,ad (µ) c α } (2.) with P(χ 2 c α) = α. Clearly, I α will have the correct coverage robability α asymtotically, i.e. P(µ I α ) = α + o(). (2.2) 3. Simulatio results I this sectio, we shall coduct some simulatio studies to comare the erformace of the emirical likelihood roosed i Sectio 2 with the ormal method. For model f (y x,θ) = ( 2π) ex[ (y θ θ 2 x) 2 /2], (θ,θ 2 ) = (.0, 0.5), X N(, ), we cosider the followig two resose robability fuctio uder the MAR assumtio.

6 L. WANG AND Q. WANG Case. P(δ = X = x) = 0.6 for all x; Case 2. P(δ = X = x) = 0.8 + 0.2 x if x, ad 0.95 elsewhere. It is easy to see that Y N(µ,.25). Usig the emirical likelihood method, first we kow that the assumtios (A)-(A5) are satisfied with the above model, the, we give the MLE θ for the arameter θ, ad calculate Ŝ(µ) ad Ŝ2(µ), resectively. At last, combiig the roof of the Theorem 2. with (2.7), we eed to comute l,ad (µ). We geerate, resectively, 5000 Mote Carlo radom samles of size = 20, 40, 60, 00 for Case ad Case 2. Based o the coverage robability for µ, the results are reseted i Table ad Table 2. As a cotrast, we also give the coverage robability of the ormal-aroximatio-based method deedig o Lemma 4.. Table Case Coverage robabilities for µ Nomial level is 0.90 Nomial level is 0.95 ormal- Emirical ormal- Emirical arox likelihood arox likelihood 20 0.9276 0.936 0.9726 0.9642 40 0.898 0.8948 0.9454 0.9506 60 0.8746 0.894 0.9350 0.9402 00 0.8668 0.8854 0.9256 0.9364 Table 2 Case 2 Coverage robabilities for µ Nomial level is 0.90 Nomial level is 0.95 ormal- Emirical ormal- Emirical arox likelihood arox likelihood 20 0.9070 0.902 0.9642 0.9652 40 0.8995 0.905 0.9500 0.9556 60 0.8908 0.9042 0.9444 0.9508 00 0.8926 0.9000 0.9430 0.9436 I case, due to E[P(δ = X = x)] = 0.6, whe is small (i.e. = 20), we fid that the ormal-aroximatio-based method is a bit

EMPIRICAL LIKELIHOOD 7 better tha the emirical likelihood, but i the case that E[P(δ = X = x)] ( 0.90) is icreased, obviously, the ormal aroximatio-based method is iferior to the emirical likelihood method for all size of samle. If we dro the missig resose data from our aalysis, the we ca give a symmetric iterval estimatio for µ usig the ormal method, amely δ i Y i δ i.25uα, δ i δ i Y i.25uα, δ i δ i where u α is the uer α/2 ercetile oit of the stadard ormal distributio. We still geerate, resectively, 5000 Mote Carlo radom samles of size = 20, 40, 60, 00 for Case ad Case 2. Based o the coverage robability for µ, we make a cotrast betwee the ormal method ad the emirical likelihood ad exhibit the results i Table 3 ad Table 4. Table 3 Case Coverage robabilities for µ Nomial level is 0.90 Nomial level is 0.95 ormal- Emirical ormal- Emirical method likelihood method likelihood 20 0.902 0.9228 0.9408 0.9624 40 0.8992 0.8990 0.9544 0.9480 60 0.8942 0.8872 0.9522 0.9484 00 0.8970 0.8768 0.952 0.9362 Table 4 Case 2 Coverage robabilities for µ Nomial level is 0.90 Nomial level is 0.95 ormal- Emirical ormal- Emirical method likelihood method likelihood 20 0.8928 0.964 0.9498 0.9586 40 0.8992 0.9026 0.9422 0.9474 60 0.9066 0.9080 0.9538 0.9542 00 0.904 0.8972 0.9500 0.9506

8 L. WANG AND Q. WANG From Table 3, we fid that the ormal method is better tha the emirical likelihood method whe the size of samle is medium or large. First, it is because that we deed o the ormal method ot the ormal-aroximatio-based method. Secod, sice E[P(δ = X = x)] is relatively small, it is ossible that the missig data somewhat cause that ormal method outerforms the emirical likelihood. However, by Table 4, whe E[P(δ = X = x)] is icreased, we see that emirical likelihood method is overmatched the ormal method whe the samle size is small or medium. Eve though the samle size is large, they are still comarable. Moveover, from Table 3 ad Table 4, we otice that the coverage accuracies for both methods ted to decrease as the samle size gets larger. However, this ot always ot the case. This reaso is that (x i, y i ) s are differet for each differet samle size as well as MLE θ ad hece this makes the comarisos uder differet sizes more difficult. 4. Proof of Theorem 2. I this sectio, i order to rove the Theorem 2., we first eed the followig lemmas. Lemma 4.. Uder the assumtios (A3)-(A5), we have where where L (Ȳ i µ) N(0, S (µ)), S (µ) = {E[( P(X))m () (X,θ)]} T Γ {E[(+P(X))m () (X,θ)]} + E[P(X)Var(Y X)] + Var(m(X,θ)), P(X) = P(δ = X), ad [ ( log f (Y X,θ) Γ = E P(X)E Proof. Write where log f (Y X,θ) )] X. (Ȳ i µ) = (I + I 2 + I 3 ), (4.) I = δ i [Y i m(x i,θ)],

EMPIRICAL LIKELIHOOD 9 I 2 I 3 = = [m(x i,θ) µ], ( δ i )[m(x i, θ ) µ(x i,θ)]. Sice I ad I 2 are meas of i.i.d. radom variables, the mai task is to cosider I 3. Notice that θ satisfies log f (Y 0 = δ i X i,θ) i, (4.2) θ= θ oe ca show that ( θ θ) is asymtotically equivalet to where 2 A δ i log f (Y i X i,θ) [ ] A = E δ 2 log f (Y X,θ) T., (4.3) Hece, it is ot difficult to see that I 3 is asymtotically equivalet to {E[( δ)m() (X,θ)]} T A here, we use the fact that ( δ i )m (2) (X i,θ) uder the assumtio (A4). ad δ i log f (Y i X i,θ) + o ( 2 ), (4.4) E[( P(X))m (2) (X,θ)] < Furthermore, by assumtios (A5) ad (A3), we have, resectively A = Γ (4.5) [ E P(X)E ( log f (Y X,θ) The, Lemma 4. follows from (4.) ad (4.4)-(4.6). )] Y X = E[P(X)m () (X,θ)]. (4.6) Lemma 4.2. Uder the assumtios (A), (A2) ad (A4), we have (Ȳ i µ) 2 S 2 (µ), where S 2 (µ) = E[(X)Var(Y X)] + Var(m(X,θ)).

0 L. WANG AND Q. WANG Proof. Similar to (4.), we get (Ȳ i µ) 2 = δ 2 i [Y i m(x i,θ)] 2 + + + 2 + 2 + It is easy to see that Q [m(x i,θ) µ] 2 ( δ i ) 2 [m(x i, θ )) m(x i,θ)] 2 δ i [Y i m(x i,θ)][m(x i,θ) µ] δ i ( δ i )[Y i m(x i,θ)][m(x i, θ ) m(x i,θ)] ( δ i )[m(x i,θ) µ][m(x i, θ ) m(x i,θ)] = Q + Q 2 + Q 3 + Q 4 + Q 5 + Q 6. (4.7) E[(X)Var(Y X)] (4.8) ad Q 2 Var(m(X,θ)). (4.9) Followig a roof aalogous to (4.4) ad usig the assumtios (A2) ad (A4), we obtai Q 3 Obviously, Q 4 0. (4.0) 0. (4.) By E[δ( δ)(y m(x,θ))m (k) (X,θ)] (k =, 2) = 0, oe has Q 5 0. (4.2) Fially, usig the assumtios (A), (A2) ad (A4), we ca obtai This roves E[( P(X))(m(X,θ)) i m (k) (X,θ)] <, for i = 0,, k =, 2. Q 6 0. (4.3) The, by (4.7)-(4.3), the coclusio of Lemma 4. is roved.

EMPIRICAL LIKELIHOOD Lemma 4.3. Uder the assumtios (A)-(A2) ad (A4)-(A5), we get Ŝ (µ) S (µ), where Ŝ(µ) ad S (µ) are defied by (2.9) ad Lemma 4., resectively. Proof. Usig some argumets that used by Lemma 4.2, we ca rove Lemma 4.3. Lemma 4.4. If EY 2 < ad E m () (X,θ) 2 <, the Proof. Notice max Ȳ i = o ( 2 ). i max Ȳ i max Y i + max m(x i,θ) i i i + max i m(x i, θ ) m(x i,θ). (4.4) By E m(x,θ) 2 EY 2 < ad the Lemma 3 of [2], we obtai max Y i = o ( 2 ), i max m(x i,θ) = o ( 2 ). (4.5) i Fially, by (4.3) ad E m () (X,θ) 2 <, we have max m(x i, θ ) m(x i,θ) max i i m() (X i,θ ) θ θ = o ( ), (4.6) where θ θ θ θ. This together with (4.5) roves Lemma 4.4. Lemma 4.5. If µ is the true arameter ad E m () (X,θ) 2 <, the lim P( mi Ȳ i < µ < max Ȳ i ) =. i i Proof. The roof is similar to [8], we omit it. Proof of Theorem 2.. By Lemma 4.2 ad Lemma 4.4, similar to [2], oe ca show that λ = O ( 2 ). (4.7)

2 L. WANG AND Q. WANG Hece, from (2.6), usig Taylor s exasio, we have l (µ) = 2 [λ(ȳ i µ)] [λ(ȳ i µ)] 2 + 2 where, by (4.7), Lemma 4.2 ad Lemma 4.4, oe has Moreover, exadig (2.5) 0 = = Ȳ i µ + λ(ȳ i µ) (Ȳ i µ) λ (Ȳ i µ) 2 + λ2 2 η i, (4.8) η i = o (). (Ȳ i µ) 3 + λ(ȳ i µ), (4.9) with the fial term bouded by λ 2 (Ȳ i µ) 3 + λ(ȳ i µ) = O ( )o ( )O () = o ( 2 ). ad This yields λ λ = (Ȳ i µ) = [ ] (Ȳ i µ) 2 [λ(ȳ i µ)] 2 + o () (4.20) It follows from (4.8), (4.20) ad (4.2) l (µ) = λ 2 = (Ȳ i µ) 2 + o () [ ] 2 [ (Ȳ i µ) (Ȳ i µ) + o ( 2 ). (4.2) (Ȳ i µ) 2 ] + o (). (4.22) Recallig the defiitio of l,ad (µ) ad usig Lemma 4.3, we obtai l,ad (µ) L χ 2. The roof of the Theorem 2. is comlete. Refereces [] A. Owe, Emirical likelihood ratio cofidece itervals for a sigle fuctioal, Biometrika, Vol. 75 (988),. 237 249.

EMPIRICAL LIKELIHOOD 3 [2] A. Owe, Emirical likelihood ratio cofidece regios, A. Statist., Vol. 8 (990),. 90 20. [3] J. N. K. Rao, O variace estimatio with imuted survey data, J. Amer. Statist. Assoc., Vol. 9 (996),. 499 502. [4] J. Qi ad J. F. Lawless, Emirical likelihood ad geeral estimatig equatios, A. Statist. 22 (994) 300-325. [5] P. E. Cheg, Noarametric estimatio of mea fuctioals with data missig as radom, J. Amer. Statist. Assoc., Vol. 89 (994),. 8 87. [6] P. Hall ad B. La Scala, Methodology ad algorithms of emirical likelihood, It. Statist. Rev., Vol. 58 (990),. 09 27. [7] Q. H. Wag ad J. N. K. Rao, Emirical likelihood for liear regressio models uder imutatio for missig resoses, The Caadia Joural of Statistics, Vol. 29 (200),. 597 608. [8] Q. H. Wag ad J. N. K. Rao, Emirical likelihood-based iferece uder imutatio for missig resose data, A. Statist., Vol. 30 (2002a),. 896 924. [9] Q. H. Wag ad J. N. K. Rao, Emirical likelihood-based iferece i liear errors-i covariables models with validatio data, Biometrika, Vol. 89 (2002b),. 345 358. [0] S. X. Che, O the accuracy of emirical likelihood cofidece regios for liear regressio model, A. Ist. Statist. Math., Vol. 45 (993),. 62 637. [] S. X. Che, Emirical likelihood cofidece itervals for liear regressio coefficiets, J. Multivariate Aal., Vol. 49 (994),. 24 40. [2] T. J. DiDiccio ad J. P. Romao, O adjustmets based o the siged root of the emirical likelihood ratio statistic, Biometrika, Vol. 76 (989),. 447 456. Received August, 2004