Nonparametric Hypothesis Testing and Condence Intervals with. Department of Statistics. University ofkentucky SUMMARY

Size: px

Start display at page:

Download "Nonparametric Hypothesis Testing and Condence Intervals with. Department of Statistics. University ofkentucky SUMMARY"

Conrad Booth
5 years ago
Views:

1 Nonparametric Hypothesis Testing and Condence Intervals with Doubly Censored Data Kun Chen and Mai Zhou Department of Statistics University ofkentucky Lexington KY 46-7 U.S.A. SUMMARY The non-parametric maximum likelihood estimator (NMLE) of the distribution function with doubly censored data can be computed using self-consistent algorithm (Turnbull 1974). We extend the self-consistent algorithm to include a constraint on the NMLE. We then show how this can be used to construct condence intervals and test hypotheses based on the NMLE via empirical likelihood ratio. Finally we present some numerical comparison of the performance of the above method with another method that make use of the inuence functions. AMS 1991 Subject Classication rimary 6G1; secondary 6G. Key Words and hrases Modied Self-consistency Empirical likelihood ratio Inuence function. 1. Introduction When data are doubly censored the nonparametric maximum likelihood estimator (NMLE) of the distribution function have no explicit form and have to be computed via some iteration [Turnbull(1976) Chang and Yang(1987) Zhou(199) Zhan and Wellner(1999)]. The estimation of the asymptotic variance of the NMLE is even more involved [Chang(199)]. Self-consistency algorithm is a general way of computing non-parametric distributional estimators when data are not completely observed [Efron (1967) Turnbull(1974) Tsai and Crowly(198)]. In particular a NMLE of distribution function based on an i.i.d. sample must satisfy the selfconsistent equation [Gill(1989)]. It is a special case of the EM algorithm when the parameter is the distribution function itself [Dempseter Laird and Rubin(1977)]. In this paper we extend the self-consistent algorithm to compute the NMLE under a constraint F (T ) for doubly censored data. This constrained estimator is useful in the construction of empirical likelihood ratio. As an application we show how this in turn will allow us to nd condence intervals and test hypotheses for doubly censored data. Another approach of constructing condence intervals based on the NMLE with doubly censored data is to estimate the inuence function/variance of the NMLE. But the inuence function is not easy to estimate and this approach may not be very practical for larger samples. We will compare the two approaches for one example in this paper. 1

2 Doubly censored data can arise when observations are subject to both right and left censoring i.e. the patients are only watched within a window of observational time otherwise we only know it is below or above the window. Double censoring also arise from pairwise comparisons when there are right censoring in both group as the following example explains. Example In paired comparison experiments we have n pairs of observations based on two treatments. To estimate the treatment dierence it is customary to focus on the n pairwise dierences. However when right censoring occurs in either treatments the pairwise dierence can be right censored or left censored as the following table shows. Trt 1 Trt di(1 ) For a pair of observations if the observation from treatment one is right-censored but treatment two is uncensored their dierence will be right-censored; if the observation from treatment one is uncensored but treatment two is right-censored their dierence will be left-censored; If both observations are uncensored their dierence is uncensored. If both observations are right-censored their dierence can take any value we drop those data from further analysis. Therefore the dierences of paired right-censored data can be doubly censored { having both left and right censored observations. Section 6 will treat this in more detail. } The rest of this section is to formally introduce the relevant notation and basic assumptions. Let 1 ; ; n be positive random variables denoting the sample of lifetimes which is independent and identically distributed with a continuous distribution F. The censoring mechanism is such that i is observable if and only if it lies inside the interval [Z i ;Y i ]. The Z i and Y i are positive random variables with continuous distribution functions G L and G R respectively andz i Y i with probability 1.If i is not inside [Z i ;Y i ] the exact value of i cannot be determined. We only know whether i is less than Z i or greater than Y i and we observe Z i or Y i correspondingly. The variable i is said to be left censored if i <Z i and right censored if i >Y i. The available information may be expressed by a pair of random variables T i ; i where T i max(min( i ;Y i );Z i ) and i 8 >< > 1 if Z i i Y i if i >Y i i 1; ; ;n (11) if i <Z i The modied (constrained) self-consistent equation for doubly censored data is derived in Section. We generalize the self-consistent algorithm to deal with several constraints in Section. We then explain how the constrained self-consistent estimate may be used to obtain condence intervals via empirical likelihood ratio in section 4. Inuence function is discussed in Section. We carry out simulations and apply our algorithm to some doubly censored data in Section 6.

3 . Modied Self-Consistent Equation Under a Constraint Suppose for a given T and we areinterested to test the hypothesis H F (T ) vs H 1 F (T ) 6 (1) As we will see later A key step to accomplish this is to be able to compute the NMLE of F (t) under H from the data (1.1). We shall focus on computing NMLE in this section. First let us write down the likelihood function ( L(F )) for doubly censored data. The likelihood function involves all n observations. However We can decompose the likelihood function into two parts for observations before and after time T. L(F ) L 1 +L (w i )+ [1 F (t i )] + [F (t i )] + i 1;in 1 i ;in 1 i ;in 1 i 1;in (v i )+ [1 F (s i )] + [F (s i )] i ;in i ;in (w i )+ [1 F (T )+F(T ) F (t i )] i 1;in 1 i ;in 1 + [F (t i )] + (v i )+ [1 F (s i )] i ;in 1 i 1;in i ;in + [F (s i ) F (T )+F(T )] () i ;in where L 1 denotes the likelihood function for n 1 observations before time T and L denotes the likelihood function for n observations after time T (n 1 + n n). w i is the jump at ith observation before time T. v i is the jump at ith observation after time T. t i is the observed time before time T. s i is the observed time after time T. Therefore w i and n 1 i1 w i corresponding to jumps before time T ; similarly v i and n i1 v i (1 ) for jumps after time T. Notice that because F (T ) wehave F (T ) F (t i )w i w n1 and F (s i ) F (T ) v 1 + v + + v (i1) The rst part of above likelihood L 1 only involves w i the jumps of F before time T as in (.). Similarly the second part of the likelihood only involves the jumps of F after time T as in (.4). Therefore we can maximize the whole likelihood in two independent steps maximize L 1 and then maximize L. If a distribution function say ^F did maximize the likelihood under H then it must satisfy that the derivative of the likelihood function at this ^F equal to. We dene a directional change of w i in the direction h() and parametrized by to facilitate derivative (under H ). Since w i must sum up to v i must sum up to (1 ) we dene them as follows

4 For jumps before time T we dene and C(). w i w i () ^F (t i ) 1+h(t i ) C() For jumps after time T we dene Clearly D() 1. v i v i () ^F (s i ) 1 1+h(s i ) D() with with C() D() n 1 i1 n i1 ^F (t i ) 1+h(t i ) ; () ^F (s i ) 1+h(s i ) (4) Remark.1 We may also use the following denition in the computation of derivatives. wi () ^F (ti )[1 + h(t i )] with C () ^F (ti )[1 + h(t i )] C () i1;t i <T Since W i ^F (ti ) is the maximum the derivative of the likelihood with respect to must be zero for any direction h(t). This leads to the equation (A.*)(see Appendix A) In particular the choice h(t) I [tu] for 1 <u<1 will give us the modied self-consistent equation under the constraint (1). We shall use the convention 1 F (t i ) jt j >t i F (t j ) ; F (t i ) jt j t i F (t j ) (a) The constrained self-consistent equation for ^F (u) when u T is ^F (u) n 1 8 < where all t i T. i 1;in 1 I[t i u]+ i ;in 1 ^F (u) ^F (ti ) + I[t i u] + 1 F (t i ;in i ) 1 The last term in (.) can also be simplied to 1 ^F (u) 1 F (t i ) i ;in 1 ^F (min(u; ti )) i ;in 1 ^F (t i ) (b) Similarly the constrained self-consistent equation for u>t is ^F (u) n 8 < i ;in t j <t i ^F (t j )I[t j u] ^F (t i ) 9 ; () ^F (u) ^F (si ) I[s i u]+ I[s i u] i 1;in i ;in 1 ^F (si ) 1 [ ^F (u) ] 9 s + j <s i ^F (sj )I[s j u] ^F (s i ) i ;in ^F (s i ) ; (6) 4

5 where all s i >T. Again the last term in (.6) can be simplied to i ;in ^F (min(u; si )) ^F (s i ) For the derivation of above self-consistent equations please see Appendix A. We summerize the result in the following theorem. Theorem.1 The NMLE of F () under hypothesis (.1) based on data (1.1) must satisfy the modied self-consistent equations (.) and (.6). } Using the above modied self-consistent equations we can iteratively compute the estimator of distribution F under the constrain equation by combining the estimator before and after time T. Theorem. The estimator obtained by the above modied self-consistent equations is at least a local maximum of the likelihood function under (1). roof See Appendix B.. Modied Self-Consistent Equations Under Many Constraints In this section we extend the modied self-consistent algorithm to handle hypotheses (constraints) at several times H F (T 1 ) 1 ;F(T ) ;;F(T k ) k H 1 F (T j ) 6 j foratleastonej; 1 j k For simplicity we only give the proof for k. Following the similar steps as in the simple hypothesis we can decompose the -likelihood function into three parts before time T 1 between time T 1 and T and after time T. L(F ) L 1 (F ) + L (F )+L (F ) (w i )+ [1 F (t i )] + [F (t i )] i 1;in 1 i ;in 1 i ;in 1 + (z i )+ [1 F (x i )] + [F (x i )] i 1;in i ;in i ;in + (v i )+ [1 F (s i )] + [F (s i )] i 1;in i ;in i ;in where w i z i and v i are dened as jumps on ith observation before time T 1 between time T 1 and T and after time T respectively; t i x i and s i are observations before time T 1 between time T 1 and T and after time T respectively. There are n 1 observations before time T 1 n observations between time T 1 and T andn observations after time T. The total number of observations is n 1 +n +n n. Note that the constraints are n 1 i1 w i 1 n i1 z i 1 and n i1 v i 1.

6 We dene the jumps before time T 1 and after time T similarly as in section. The jumps between time T 1 and T can be dened as and B() 1. z i z i () ^F (xi ) 1 1+h(x i ) with n i1 ^F (xi ) 1+h(x i ) ; L 1 (F ) and L (F ) here are the same as L 1 (F ) and L (F )insection. We shall focus on L (F ) the -likelihood function between time T 1 and T. The self-consistent equation for T 1 u T is ^F (u) n 8 < i ;in i 1;in I(x i u) 1 1 [ ^F (u) 1 ] [ ^F (u) ^F (xi )]I [ x i u] 1 ^F + (x i ) i ;in 1 ^F (x i ) [ + ^F (min(x i ;u)) 1 ] + i ;in ^F (x i ) where T 1 x i T (For the details see Appendix C). i ;in 4. Empirical Likelihood Ratio Tests And Condence Intervals 1 1 [ ^F (u) 1 ] ^F (x i ) 9 ; (1) Let ~ F be the NMLE of F which maximizes likelihood L(F ) dened in (.) over all distributions and ^F denote the NMLE of F under H which maximizes the likelihood only among distributions that satisfy H.We dene the empirical likelihood ratio function as Our likelihood ratio test statistic is R(H ) R(H ) L( ^F ) L( ~ F ) max H L(F ) max H +H 1 L(F ) ( max L(F )) (max L(F )) H +H 1 H h i (L( F ~ )) (L( ^F )) The method described in section will enable us to compute the constrained NMLE ^F which maximizes the likelihood under H F (T ). Using the usual self-consistent algorithm we can compute NMLE ~ F without any constraint. So once we have these estimates we can easily compute the empirical likelihood ratio. We use chi-square theory to carry out hypothesis testing 6

7 and construct condence intervals. For theory about empirical likelihood ratio see Owen (1987) and Murphy and Van der Varrt (199). If the observed R(H ) is greater than 1; (the 1(1)th percentile of with 1 degree freedom) we reject H at signicance level. To construct the condence interval for F (T ) we can test dierent hypotheses with xed T and various 's and form the 1 condence interval for F (T )as n o R(H F (T )) 1; (41) We can also the construct condence interval for percentile F 1 () as follows. Test many hypotheses with xed value and various T values and form the condence interval as n o T R(H F (T )) 1; (4) In particular a condence interval for the median can be obtained with 1.. Inuence Function of NMLE and Its Estimation Inuence function (or inuence curve) is a general technique to obtain the variance of a random process (and more). In the analysis of ~ F () with doubly censored samples there are three inuence functions corresponding to right left and non-censored observations. Chang (199) computed those asymptotic inuence functions for the process p n( ^F n (t) F (t)) he obtained the following where p n( ^Fn (t) F (t)) " q (n) j (t) p 1 n n i I [zi t; i j] Z T Z T +! IC 1 (t; s)dq (n) 1 (s)+ Z T IC (t; s)dq (n) (s)+o(n) p (1) ; E 1 n i I [zi t; i j]!# IC (t; s)dq (n) (s) j ; 1; >From the above we could try to estimate the three asymptotic inuence functions and then estimate the variance of p n( ^F (t) F (t)) by Z T Z T IC^ 1 (t; s)dq(n) (s)+ 1 Z T ^ IC 1 (t; s)dq (n) 1 (s)+ Z T Z T IC^ (t; s)dq(n) (s)+ ^ IC (t; s)dq (n) (s)+ Z T IC^ (t; s)dq(n) (s) ^ IC (t; s)dq (n) (s)! ^ IC j (t; s) However those inuence functions are only dened via Fredholm integral equations that involve unknown distribution. 7

8 We plug-in the (self consistent) estimate of those distribution functions discretize the Fredholm integral equations into matrix equations and solve for IC. ^ For details see Chang (199) and Numerical Recipes in C. In this approach we need to solve a matrix equation that resulted from discretizing corresponding integral equations for inuence functions. If we discretizing at a few points then the estimate would not be very good. If we use a lot of discretizing points computation is slow. Therefore for large sample sizes this approach is very computationally expensive (to nd the inverse of a large matrix). In our experience when censored observations (both right and left) total exceed it becomes slow in our implementation since we discretizing at observed censoring times. Nevertheless this approach is worth exploring and it is interesting to compare to the approach of empirical likelihood described in Section Applications Simulations and Examples 6.1 Applications aired Comparison In section 1 we gave an example indicating that doubly censored data may result from paired comparison experiments. We shall specify a model for this case and illustrate the use of the testing procedures to test the hypothesis for drug eect when we donotwant tomake parametric assumptions. A reasonable model for the paired experiment isasfollows for ith subject (or pair) (i 1; ;;n)we observe Y 1i and Y i where Y 1i d + S i + 1i ; Y i p + S i + i ; where d ( p ) is the main eect for drug (placebo) S i is the subject eect ki is the random error. The dierence of Y 1i and Y i is D i ( d p )+( 1i i ); which is free from S i a fact that lead many test procedures to be based on the D i 's. If we assume 1i and 1i are exchangable then the median of D i is d p.thusatestofh d p can be carried out by testing if the median of D i is zero. Double censoring on the D i requires our test as described in section 4 and. In the case where ki are i.i.d. with a distribution of exp() 1 (mean zero exponential) D i has double exponential distribution with location parameter d p. Since the sample median is 8

9 MLE of location parameter for double exponential distribution we can expect the test to perform well in this case. We carried out some simulations for this procedure summerized in Table 6.1. Table 6.1 The ercentage Of Rejecting H d p At Dierence Size No Censored Light Censored Medium Censored Heavy Censored n n n n n n When the null hypothesis is true the percentages of rejecting H are very close to the nominal level for small and large samples. When there is dierence between drug and placebo the rejecting percentages decrease with the increases of censoring observations. 6. Simulation Hypothesis Testing In our rst simulation we took normally distributed samples of size 1 and size respectively for each run and each entry in Table 6. was based on runs. Table 6. Generating Normally Distributed Samples i 1; ; ; n i Z i Y i light-censored N( 1;) N( 6;) exp(1) + Z i +8 (1% % censored) medium-censored N( 1;) N( 7;) exp(1) + Z i + (% 4% censored) heavy-censored N( 1;) N( 9;) exp(1) + Z i + (4% 6% censored) Table 6. illustrates the probabilities of rejecting H F (T ) at nominal signicance level (or 1). The percentages were computed as the number of -likelihood ratios greater than critical value 1; 84 (or 1;1 71) divided by. From Table 6. we can see the probabilities of rejecting H are pretty close to the nominal level (or 1). We took exponentially distributed samples of size 1 and size respectively for our second simulation. This simulation was also based on samples. 9

10 Table 6. The ercentage Of Rejecting H F (T ) At And 1 T Sample Size Light Censored Medium Censored Heavy Censored 1 n 1 8% / 16% % / 14% 466% / 16% n 46% / 176% 48% / 11% 49% / 114% n 1 4% / 1% % / 14% 7% / 144% n 1% / 18% 68% / 11% 8% / 17% Table 6.4 Generating Exponentially Distributed Samples i 1; ; ; n i Z i Y i light-censored exp() exp(1) exp(1) + Z i +1 (1% % censored) medium-censored exp() exp(8) exp(1) + Z i + (% 4% censored) heavy-censored exp() exp() exp(1) + Z i (4% 6% censored) Again Table 6. shows that the probabilities of rejecting H (or 1). are around the nominal level Table 6. The ercentage Of Rejecting H F (T ) At And 1 T Sample Size Light Censored Medium Censored Heavy Censored 616 n 1 418% / 96% 4% / 98% 494% / 998% n 48% / 968% 47% / 974% 4% / 1% n 1 47% / 864% 46% / 94% 448% / 98% n 4% / 118% 48% / 118% 46% / 168% Figure 6.1 and 6. are Q-Q plots of -likelihood ratios for the above twosimulations verse the percentiles. At the point 84 (or 71) if the -likelihood ratio line is above the (1) dashed line (4 line) the rejecting probability is greater than % (or 1%). Otherwise the rejecting probability is less than % (or 1%). Remark 6.1 The Q-Q plots for exponentially distributed light-censored simulations (Figure 6.1(d) and 6. (d)) are somehow more discrete than the others. We did uncensored simulations with exponential distribution the Q-Q plot is similar to Figure 6.1(d) and 6. (d). Since the constraint isf (T ) ^F and F ~ only depend on the number of observations(n1 ) before and (n ) after time T. If there are the same n 1 and n in two uncensored samples -likelihood ratio will be the same. That is why uncensored and light-censored plots look more discrete. The censoring somehow smoothed the plot and made the chi-square approximation better. 1

11 6. Example Condence intervals { A case study We compare the condence intervals obtained via empirical likelihood ratio as in section 4 to condence intervals obtained by directly estimating the asymptotic variance of ^F (T ) and then form the Wald (1 ) condence interval ^F (T ) Z (1) q ^ Var[ ^F (T )] (61) where Z (1) is the 1(1)th percentile of a standard normal distribution. Better condence intervals may be obtained on a transformed scale. We may use the - transformation to obtain ( ^S() 1 ^F ()) q [ ^S(T ) b ; ^S(T ) 1b ] ; where b exp4 Z (1) Var[ ^ ^S(T )] (6) ^S(T )j ^S(T )j or the it transformation to obtain where and a b e a [ 1+e a ; e b 1+e b ] ; (6) ^F (T ) 1 ^F (T ) Z q (1) ^F (T )(1 ^F Var[ ^ ^F (T )] (T )) ^F (T ) 1 ^F (T ) + Z (1) ^F (T )(1 ^F (T )) q ^ Var[ ^F (T )] It is not easy to obtain the Wald type condence interval for the median (or other quantiles). But we can invert the test of F (T ) with estimated variance. Therefore the condence set for median may be obtained as the set of points T that satisfy the following condition 8 < T Z (1) ^F (T ) q Z (1) Var[ ^ ^F (T )] ; (64) To construct condence intervals we will use the following doubly censored data as our example. Turnbull and Weiss (1978) reported part of a study conducted at Stanford-alo Alto eer Counseling rogram (see Hamburg et al. (197) for details of the study). In this study 191 California high school boys were asked \When did you rst use marijuana?" The answers are either the exact age (uncensored observations) or \I never used it" which are right-censored observations at the boys' current ages or \I have used it but can not recall just when the rst time was" which are left-censored observations. Table 6.6 shows the results of this study. The estimated median age of high school boys who use marijuana is 14 years old. 9 11

12 Suppose we are interested to obtain 9% condence interval for the median age of rst time marijuana use. Table 6.7 shows the 9% condence intervals for the median age with (4.) and (6.4). Table 6.8 shows four dierent kinds of 9% condence intervals for at the median age with (4.1) (6.1) (6.) and (6.). Table 6.7 9% Condence Intervals For Median Age(14) method Condence Interval T R(H F (T )) 84 (4) (11; ) T 196 ^F (t) p Var[ ^ ^F (t)] 196 (6.4) (1; 19) Table 6.8 9% Condence Intervals For F (14) method Condence Interval R(H F (14) ) 84 (41) (79876; 71987) [1 ^S(14) b ; 1 ^S(14) 1b ] (6) (786; ) e [ a e 1+e ; b ] (6) (171994; 9844) a 1+e b ^F (14) 196q Var[ ^ (14)] (61) (8; 111) From Table 6.8 we can see the 9% condence interval obtained by empirical likelihood ratio (4.1) is narrower than any other three intervals with (6.1) (6.) and (6.). The two transformed intervals obtain by (6.) and (6.) are wider than (4.1) but close to each other. We do not know which transformation is better. We believe the empirical likelihood ratio approach is better because we do not need to know what is the best transformation and simulation in the previous section show the chi square approximation is pretty accurate. The Wald condence interval (6.1) includes the numbers less than and greater than 1. These are not reasonable condence limits for a distribution since a distribution must be between and 1. Therefore in this example we conclude that the empirical likelihood ratio approach works better than Wald's approach. 6.4 Software Used The simulation was carried out using Splus.4 for Unix on the H workstations. The Splus functions that computes the constrained NMLE with doubly censored data will be uploaded to Statlib in the near future as an update to the function d9newr that was there since June 199. References Andersen.K. Borgan O. Gill R. and Keiding N. (199) Statistical Models Based on Counting rocesses. Springer New York. 1

13 Chang M. N. and Yang G. L. (1987) Strong Consistency of a Nonparametric Estimator of the Survival Function with Doubly Censored Data. Ann. Statist Chang M. N. (199). Weak Convergence in Doubly Censored Data. Ann. Statist Dempseter A.. Laird N. M. and Rubin D. B. (1977). Maximum Likelihood From Incomplete Data via The EM Algorithm with Discussion). J. Roy. Statist. Soc. Ser. B Efron B. (1967). The Two Sample roblem with Censored Data. roc. Fifth Berkeley Symp. Math. Statist. robab Gill R. (1989) Non-and Semi-parametric Maximum Likelihood Estimator and the Von Mises Method (I) Scand. J. Statist Hamberg B.A. Kraemer H.C. and Jahnke W. (197). A Hierarchy of Drug Use in Adolescence Behavioral and Attitudinal Correlates of Substantial Drug Use. American Journal of sychiatry 1 (197) Kaplan E. and Meier. (198) Non-parametric Estimator From Incomplete Observations JASA 47{ 481. Murphy S. and Van der Varrt (1997). Semiparametric Likelihood Ratio Inference. Ann. Statist Owen A. (1988). Empirical Likelihood Ratio Condence Intervals for a Single Functional. Biometrika ress W. Teukolsky S. Vetterling W. and Flannery B. (199) Numerical Recipes in C The Art of Scientic Computing. Cambridge Univ. ress. Turnbull B. (1974) Nonparametric Estimation of a Survivorship Function with Doubly Censored Data. JASA Turnbull B. (1976) The Empirical Distribution Function with Arbitrarily Grouped Censored and Truncated Data. JRSS B 9-9. Turnbull B. and Weiss (1978). A Likelihood Ratio Statistic for Testing Goodness of Fit with Randomly Censored Data. Biometrics 4 (1978) Wei-Yann Tsai and John Crowley (198) A Large Sample Study of Generalized Maximum Likelihood Estimator From Incomplete Data Via Self-Consistency. Ann. Statist. 1 No Zhan Y. and Wellner J. (1999). A hybrid algorithm for computation of the NMLE from censored data. JASA Zhou M.(199) http//lib.stat.cmu.edu/s/d9newr. Appendix A Derivation of the Self-consistent Equation Before Time T (a) The likelihood function before time T L 1 i1;in 1 (w i )+ i1;in 1 (w i )+ [1 F (t i )] + 4 (1 )+ To facilitate derivative we substitute w i () as in (.) L 1 i1;in 1 + i;in 1 ^F(ti) 1+h(t i ) C() + ^F(tj) 1+h(t j ) t j<t i;jn 1 C() 1 i;in 1 [F (t i )] t j>t i;jn 1 w j + 4 (1 )+ i;in 1 1+h(t j ) t j>t i;jn 1 t j<t i;jn 1 w j C() 1 A

14 C() + i1;in n 1 i;in 1 + C() + i;in C() + i1;in 1 C()+ i1;in 1 t j<t i;jn 1 ^F(ti) 1+h(t i ) + i;in 1 ^F(tj) 1+h(t j ) t j>t i;jn 1 ^F(ti) 1+h(t i ) + ^F(tj) 1+h(t j ) Now we are ready to take derivative with respect L n 1 C () C() + If we set the above can be simplied L j n 1 n 1 i1 i;in 1 ^F(t i )h(t i ) This derivative must be zero since ^F is the NMLE. Thus we get n 1 i1 ^F(t i )h(t i ) n 1 8 < ^F(tj) 1+h(t j ) t j<t i;jn 1 i1;in 1 4 h(t i ) 1+h(t i ) C() (1 )C() + 1 C () t j>t i ^F (t j)h(t j) [1+h(t j)] 1 C()+ t j>t i ^F (t j) 1+h(t j) t j<t i ^F (t j)h(t i) [1+h(t j)] t j<t i ^F (t j) 1+h(t j ) i1;in 1 h(t i ) t j>t i ^F(tj )h(t j ) (1 )+ t j>t i ^F(tj ) i1;in 1 h(t i )+ t + j>t i ^F(t j )h(t j ) 1 F (t i ) If we take h(t i )I[t i u] for u T (A) becomes equation (.). 1 i;in 1 ^F(tj) 1+h(t j ) t j>t i;jn 1 1 n1 ^F(ti i1 )h(t i ) (1 )+ t j>t i ^F(t j ) n1 i1 ^F(ti )h(t i ) 1 F (t i ) + i;in 1 t j<t i ^F(tj )h(t j ) t j<t i ^F(tj ) t j<t i ^F(t j )h(t j ) ^F (t i ) 9 ; (A) Derivation of the Self-consistent Equation After Time T Similarly we can obtain the constrained self-consistent equation for ^F when u>t. 14

15 (b) The likelihood function after time T is L i1;in (v i )+ i1;in (v i )+ Again substituting v i () asin(.4) L i1;in + i;in i;in [1 F (s i )] + i;in ( ^F(si) 1 1+h(s i ) 4 + i1;in 1 D() + + i;in + i;in n 1 D() + + i;in D() + s j>s i;jn v j )+ ^F(sj) 1+h(s j ) s j <s i i1;in ^F(sj) 1+h(s j ) + s j >s i 4 1 D()+ i1;in Taking derivative with respect to it i;in 1 D() ^F(si) 1+h(s i ) + i;in [F (s i )] i;in ( + ^F(sj) 1 1+h(s s j ) D() j>s i 1 D() i;in ^F(sj) 1+h(s j ) s j<s i ^F(si) 1+h(s i ) D()+ n D () D() i;in + i;in If we set and the derivative must be zero thus n i1 ^F(si )h(s i ) 1 n 8 < + i;in i1;in h(s i )+ 1 i;in 1 D() i;in ^F(sj) 1+h(s j ) s j <s i i1;in h(s i ) [1 + h(s i )] s j>s i ^F (s j )h(s j) [1+h(s j )] s j>s i ^F (s j) 1+h(s j) s j>s i () ^F (s j)h(s i) 1 D s j<s i [1+h(s j )] D()+ ^F (s j) 1 s j<s i 1+h(s j) i;in n i1 ^F(sj )h(s j ) + s j <s i ^F(sj ) s j>s i ^F(sj )h(s j ) s j>s i ^F(sj ) + i;in If we take h(s i )I[s i u] for u>t the above equation becomes (.6). s j<s i;jn v j ) ^F(sj) 1+h(s j ) s j <s i ^F(s j )h(s j ) + s j<s i ^F(sj ) 9 ; 1

16 Appendix B roof of Theorem. Taking the second derivative of likelihood function L 1 with respect to and setting and h(t i )I [tiu] we can simplify the derivative as L j n 1 ^F(u)+ n 1 [ ^F(u)] + Substituting ^F(u) as in (.) where L j n 1 a i i1 + + n 1 i;in 1 1 ^F(u) 1 ^F (ti ) + i1;in 1 I [tiu] ^F(u) ^F(ti ) 1 ^F (ti ) ^F(u)+[^F(u) ^F(t i )]I [tiu] ^F(min(u; t i ) ^F(t i ) i1;in 1 I [tiu] + n1 [ ^F(u)] i;in 1 8 > < n 1 > 1 n 1 + i;in 1 [1 ^F(t i )] i;in 1 n 1 ^F (u)+[^f(u) ^F (t i )]I [tiu] " ^F(min(u; ti )) < n 1 1 n 1 n 1 n 1 a i i1 ^F(t i ) i1;in 1 I [t iu] + i1 ^F(min(u; t i )) ^F(t i ) a i " 1 n 1 n 1 [1 ^F (t i )] # # 9 a i ; i1 I + [t iu] i1;in 1 " # ^F(min(u; ti )) + i;in 1 i1;in 1 I [tiu] + ^F(t i ) 16 o! " # 9 ^F(u) ; o I [tiu] ( ^F(min(u; ti ) ^F(t i ) ) n o 1 ^F (u)+[^f(u) ^F (ti )]I [tiu] [1 ^F(t i )] n o 1 ^F (u)+[^f(u) ^F (t i )]I [tiu] 1 [1 ^F(t i )] ^F (u)+[^f(u) ^F (t i )]I [tiu] [1 ^F (ti )]

17 + i;in 1 ^F (min(u; t i ) ^F (t i ) By (.) and Cauchy-Schwartz Inequality it is clear that the second derivative of L 1 is non-positive. The second derivative ofl is also non-positive by following the similar steps. } L Appendix C Derivation of the self-consistent Equations Between Time T 1 and T i1;in (z i )+ i;in [1 F (x i )] + i;in [F (x i )] (z i )+ [1 F (T )+F(T ) F (x i )] i1;in i;in + [F (x i ) F (T 1 )+F(T 1 )] i;in (z i )+ 4 (1 )+ i1;in i;in lug z i () into L into above equation we get L i1;in + " ^F(xi ) 1+h(x i ) 4 1 i;in x j<x i;jn 1 + i1;in i1;in i;in + i;in n i1;in # x j>x i;in z j (1 )+ i;in ^F(xj) h(x j ) ^F(xi) 1+h(x i ) 1 (1 )+ x j>x i;jn i;in + i;in 4 x j<x i;jn Taking derivative with respective n B () x j>x i;jn ^F(xj) 1+h(x j ) + 1 ^F(xi) 1+h(x i ) ^F(xj) 1+h(x j ) x j>x i;jn ^F(xj) 1+h(x j ) i1;in 1 17 ^F(xj) 1+h(x j ) h(x i ) 1+h(x i ) i;in 1 1 x j>x i;jn 1 4 x j<x i;in z j + 1 ^F(xj) 1+h(x j ) 1

18 + + i;in 1 B 1 () ^F (x j)h(x j ) x j>x i [1+h(x j )] ^F (x j) x j>x i 1+h(x j ) ^F (x j)h(x i) x j <x i;jn [1+h(x j + )] 1 x ^F (x j) + j <x i;jn 1+h(x j) 1 B () 1 1 If we set the above equation can be simplied j n n 1 i1 i;in i;in ^F(x i )h(x i ) i1;in h(x i ) 1 n ^F(x 1 i1 i )h(x i )+ x j >x i ^F(x j )h(x j ) (1 )+ x j>x i ^F(xj ) x j<x i ^F(xj )h(x j )+ 1 n 1 i1 ^F(xi )h(x i ) 1 + x j <x i ^F (x j ) Since the derivative is equal to zero we rearrange the terms and obtain n i1 ^F(xi )h(x i ) 1 n 8 < + i;in i1;in h(x i ) 1 n 1 i1 ^F(xi )h(x i )+ x j>x i ^F(xj )h(x j ) + i;in x j<x i ^F(x j )h(x j )+ (1 )+ x j >x i ^F (x j ) 1 n x j <x i ^F(xj ) i1 ^F(x i )h(x i ) Now we take h(x i )I [xiu] for T 1 u T the above equation can be rewritten as (.1). 9 ; 18

19 Table 6.6 Marijuana Use In High School Boys Age Number of Exact Number Who Have Yet Number Who Have Started Observations to Smoke Marijuana Smoking at an Earlier Age >

20 Normally Distributed Light-Censored Simulation Normally Distributed Medium-Censored Simulation -likelihood Ratios Ho F(1). Ho F(8).186 -likelihood Ratios Ho F(1). Ho F(8) Chi-Square ercentiles (a) Chi-Square ercentiles (b) Normally Distributed Heavy-Censored Simulation Exponentially Distributed Light-Censored Simulation -likelihood Ratios Ho F(1). Ho F(8).186 -likelihood Ratios Ho F(.).616 Ho F(.) Chi-Square ercentiles (c) Chi-Square ercentiles (d) Exponentially Distributed Medium-Censored Simulation Exponentially Distributed Heavy-Censored Simulation -likelihood Ratios Ho F(.).616 Ho F(.) likelihood Ratios Ho F(.).616 Ho F(.) Chi-Square ercentiles (e) Chi-Square ercentiles (f) Figure 6.1 Q-Q lot of -likelihood Ratios vs. (1) ercentiles For Sample Size 1

21 Normally Distributed Light-Censored Simulation Normally Distributed Medium-Censored Simulation -likelihood Ratios Ho F(1). Ho F(8).186 -likelihood Ratios Ho F(1). Ho F(8) Chi-Square ercentiles (a) Chi-Square ercentiles (b) Normally Distributed Heavy-Censored Simulation Exponentially Distributed Light-Censored Simulation -likelihood Ratios Ho F(1). Ho F(8).186 -likelihood Ratios Ho F(.).616 Ho F(.) Chi-Square ercentiles (c) Chi-Square ercentiles (d) Exponentially Distributed Medium-Censored Simulation Exponentially Distributed Heavy-Censored Simulation -likelihood Ratios Ho F(.).616 Ho F(.) likelihood Ratios Ho F(.).616 Ho F(.) Chi-Square ercentiles (e) Chi-Square ercentiles (f) Figure 6. Q-Q lot of likelihood Ratios vs. (1) ercentiles For Sample Size 1

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),