Censored partial regression

Size: px
Start display at page:

Download "Censored partial regression"

Transcription

1 Biostatistics (2003), 4, 1,pp Printed in Great Britain Censored partial regression JESUS ORBE, EVA FERREIRA, VICENTE NÚÑEZ-ANTÓN Departamento de Econometría y Estadística, Facultad de Ciencias Económicas y Empresariales, Universidad del País Vasco-Euskal Herriko Unibertsitatea, Avda. Lehendakari Agirre 83, E Bilbao, Spain etpnuanv@bs.ehu.es SUMMARY In this work we study the effect of several covariates on a censored response variable with unknown probability distribution. A semiparametric model is proposed to consider situations where the functional form of the effect of one or more covariates is unknown, as is the case in the application presented in this work. We provide its estimation procedure and, in addition, a bootstrap technique to make inference on the parameters. A simulation study has been carried out to show the good performance of the proposed estimation process and to analyse the effect of the censorship. Finally, we present the results when the methodology is applied to AIDS diagnosed patients. Keywords: Bootstrap; Censorship; Duration models; Kaplan Meier; Nonparametric estimation. 1. INTRODUCTION In this paper we propose a new methodology to estimate and make inference in the context of a censored partial regression model. We have a dataset that contains information about the survival time for AIDS-diagnosed patients and some covariates. Our aim is to analyse the effect of these covariates on the patients survival time starting from the diagnosis time of the illness up to death or up to the end of the study; that is, there is censorship in the sample. The covariates are: sex, age at diagnosis, transmission category, disease at AIDS diagnosis and the period of diagnosis; the latter is used to capture the effect of the introduction of the AZT treatment (started in the middle of 1987). In a previous study, Orbe et al. (1996) used a dummy variable to capture this effect, a quite restrictive specification, since it seems more realistic to assume a more gradual effect. This motivates the proposal of a censored partial regression model, introducing all of the covariates in a parametric and linear form, except for the period of diagnosis, which is introduced through a nonparametric term. As is well known, the traditional methods applied in standard problems in Statistics cannot be used for censored data. There are two main classes of regression models that analyse the dependence between the duration and the covariates: the proportional hazards models (Cox, 1972) and the accelerated failure time models (see e.g. Lawless (1982)). The effect of the covariates in the Cox model is multiplicative on the baseline hazard. The advantage of this model, and the main reason for its extensive use, lies in the possibility to estimate the parameters of interest without any assumption on the distribution of the duration variable. However, the assumption of a proportional hazard function for different individuals is very restrictive, and, in some cases, this proportionality is not verified by the data. To whom correspondence should be addressed c Oxford University Press (2003)

2 110 J. ORBE ET AL. On the other hand, in the accelerated failure time models, the survival time (or a monotonic transformation) is modelled linearly in the covariates. Usually, the estimation of these models is carried out assuming a distribution for the duration. However, some authors have developed methods to avoid the need for this assumption. Miller (1976); Koul et al. (1981); Buckley (1979), among others, proposed the first estimation procedures in this direction. In the last decade, Ritov (1990); Tsiatis (1990); Wei et al. (1990); Jones (1997); Lin et al. (1998); Gray (2000) and Jin et al. (2001) used rank tests to estimate the parameters within this class of models. Stute (1993) proposed an estimation procedure based on weighted least squares, using the Kaplan Meier weights. All of these proposals are designed for models where the effect of the covariates is specified in a parametric manner. To allow for more flexible situations, Gray (1992) and Hastie and Tibshirani (1993) extended the Cox model to allow for nonparametric effects of some covariates on the hazard functions. Kooperberg et al. (1995) proposed the HARE (HAzard REgression) models, where linear splines and selected tensor products are used to estimate the logarithm of the conditional hazard function, which has also been studied by Intrator and Kooperberg (1995). Leblanc and Crowley (1999) did a similar work but they modelled the relative hazard function instead. Since we are interested in modelling the direct effect of the covariates on the duration rather than on the hazard function, our work fits in the accelerated failure time models framework, which is quite appealing to medical investigators due to its ease of interpretation. Moreover and to our knowledge, there is no previous work dealing with semiparametric estimators for these models. The rest of the paper is outlined as follows. In Section 2, we present the censored partial regression model and provide the details for the estimation procedure. Section 3 describes a method that uses bootstrap resample techniques to make inference. Section 4 provides some simulation results that support the adequacy of the methodology to different situations. Section 5 presents the results of the application to AIDS-diagnosed patients and, finally, Section 6 presents the conclusions. 2. A CENSORED PARTIAL REGRESSION MODEL The proposal we put forward in this section extends the linear censored regression model, allowing for a possible nonparametric specification for one or more covariates. Stute (1993) presents a new methodology for a regression model with censored data that allows for either fixed or random covariables and requires weak assumptions for the probability distributions involved. Moreover, the estimators can be easily obtained using weighted least squares. In order to explain the method, let us assume that T 1,...,T n are independent observations from some unknown distribution function F. Due to the censored data, we observe Y i = min(t i, C i ), where C 1,...,C n denote the values for the censoring variable C. Let δ i = I (T i C i ) be the failure indicator. In addition, X i represents the k-order vector of covariates for the ith individual. This model requires the following identifiability conditions: (H.1) T and C are independent, and (H.2) P(T i C i T i, X i ) = P(T i C i T i ),fori = 1,...,n. Hypothesis (H.1) is a standard assumption in random censorship models and (H.2) indicates that, given the duration, the covariables do not provide any further information as to whether the observation is censored or not, a weaker assumption than the independence between C and X. The relation between the covariates and the logarithm of the duration is modelled as ln T i = X i β + ɛ i, with E[ɛ i X i ]=0. The logarithm can be replaced, without loss of generality, by any other monotone transformation. The estimator of β can be obtained by weighted least squares, minimizing n i=1 W in [ln Y (i) X [i] β] 2, where ln Y (i) is the ith ordered value of the observed response variable ln Y, X [i] is the concomitant associated with ln Y (i), and W in are the Kaplan Meier weights. In a compact notation, the solution can be written as ˆβ = (X T WX) 1 X T W ln Y, where ln Y = (ln Y (1),...,ln Y (n) ) T, W is a diagonal matrix

3 Censored partial regression 111 formed with the Kaplan Meier weights and X =[X T 1, X T 2,...,X T n ]T is the matrix of covariates. Stute (1993) studied the consistency of this estimator and Stute (1996) its asymptotic normality. We now propose a generalization of the censored linear model to a censored partial linear model. That is, we will introduce a nonparametric component, where we do not specify any functional form for the effect of one or more covariates. In our application, the period of diagnosis (R) will be the variable introduced in a nonparametric form. Hence, the model can now be written as ln T i = X i β + h(r i ) + ɛ i. (1) The partial linear model (1) has been studied when the dependent variable is not censored. Heckman (1986) and Rice (1986) studied the case where h( ) is estimated using splines, and Speckman (1988) estimated h( ) using kernels. In this work we propose a method to fit (1) when the dependent variable is at risk of being censored. In order to fit model (1), we have to consider two main issues. On the one hand, the goodness of fit and, on the other hand, the smoothness of the proposed function to model the effect of the covariate included in the nonparametric component. As for the goodness of fit, this is controlled through the sum of the weighted squared residuals using the Kaplan Meier weights, that account for the existence of censored observations in the sample. This term corresponds to the equation n i=1 W in [ln Y (i) X [i] β h(r [i] )] 2, that can be written in matrix form as (ln Y Xβ Nh) T W (ln Y Xβ Nh), where X, W, and ln Y are as above. The d-order vector h contains the values h j = h(s j ) for the d distinct ordered values S 1 < < S d,ofthe covariate R. The connection between S 1,...,S d and R 1,...,R n is given through the n d incidence matrix N which assigns the respective value of the covariate R to the ith individual; that is, N ij = 1ifR i = S j, and zero otherwise. As for the smoothness, we measure it in the usual way using the integral of the square of second derivatives. Hence, the combination of both terms can be handled by minimizing the penalized weighted least squares expression (ln Y Xβ Nh) T W (ln Y Xβ Nh) + α [h (r)] 2 dr, (2) where the solution for the function h( ) is searched in the space of functions that are twice differentiable over the domain of R and have absolutely continuous first derivative. The degree of smoothness is determined by α, the smoothing parameter. Large values of α produce smoother curves, while smaller values produce more wiggly curves. Given α > 0, the solution that minimizes (2) is a smoothing cubic spline function with knots at S 1,...,S d, which is a mathematical consequence of the choice of [h (r)] 2 dr as the penalty term. As α approaches to zero, the penalty term loses importance and the solution tends to the cubic spline that interpolates the data; that is, such that h(r [i] ) = ln Y (i) X [i] β or the average if we have tieds. However, when α is large enough, the penalty term dominates and, thus, we obtain the weighted least squares solution to a linear model; that is, the solution will provide a linear function for h( ). In practice, the value of α can be selected using a data-driven criterion. The usual one is cross-validation and this will be the criterion we will use in the empirical analysis. Using the properties of cubic splines the integral term can be computed as αh T Kh, where K is a square matrix of order d, that only depends on the location of the knots. Further details on splines and how to calculate the K matrix can be found in Green and Silverman (1994, chapter 2). Thus, the minimization problem can be rewritten as (ln Y Xβ Nh) T W (ln Y Xβ Nh) + αh T Kh. (3)

4 112 J. ORBE ET AL. Taking derivatives in (3) with respect to β and h, and reordering the terms leads us to the following pair of simultaneous matrix equations: X T WXβ = X T W (ln Y Nh) (a) (N T WN + αk )h = N T W (ln Y Xβ) (b). (4) The uniqueness of the solution to (4) is given by the fact that (i) W is diagonal and with non-negative elements and, therefore, it is a symmetric non-negative definite matrix; (ii) the columns of X are linearly independent and (iii) there is no linear combination Xβ equal to a linear form δ 1 + δ 2 R.Inpractice, the estimations of β and h can be obtained iterating between equations 4(a) and 4(b), solving repeatedly for β and h, respectively, until convergence is achieved (i.e. using the backfitting algorithm). Conditions (i) (iii) also guarantee the convergence of the backfitting algorithm and, in most situations, this convergence is quite fast. For example, for our application (see Section 5), this convergence has been achieved after only four iterations. The proof of the uniqueness of the solution to (4) and the convergence of the backfitting algorithm follows directly from Theorem 4.1 in Green and Silverman (1994). The complete estimation process can be described by the following steps: step 1: separate the repeated values of the response variable; step 2: calculate the Kaplan Meier estimator ˆF n (Kaplan and Meier, 1958) for the distribution function F and compute the Kaplan Meier weights W in ; step 3: put the observed response variable ln Y in increasing order and reorder the covariates in agreement with this given order; step 4: build the incidence matrix N. Now the steps for the backfitting algorithm are: step 5: obtain the initial estimated value of h (ĥ 0 )byapplying ordinary least squares between ln Y and N; step 6: substitute h by ĥ 0 in 4(a) and obtain ˆβ 0 by weighted least squares using the Kaplan Meier weights; step 7: substitute β by ˆβ 0 in 4(b) and obtain the new estimation of h (ĥ 1 ) applying a natural cubic spline smoother to the difference (ln Y Xβ);step 8: go to step 6 and iterate until convergence is achieved. In order to analyse the statistical properties of the resulting estimators we believe it is important to point out some relevant considerations related to this issue. On the one hand, consistency and asymptotic properties for the partial linear model estimator without censored observations have been studied by Heckman (1986), and the resulting spline estimator is consistent for very general hypotheses where penalized weighted least squares are applied. Basically, it is required that the smoothing parameter α go to zero together with α 1/4 n tending to infinity. The idea behind this has to do with the local character of the nonparametric estimator. The smoothing parameter has to be small enough to fit the data but not too small such that there is no sufficient amount of data involved in the estimation procedure. On the other hand, in a pure parametric context, under (H.1) and (H.2) the estimator proposed by Stute is consistent and has an asymptotic normal distribution (Stute, 1993, 1996). For the partial linear model, the system of equations in (4) indicates that, if the parameter estimators are consistent, the nonparametric part appears as the solution of a penalized weighted least squares problem. Thus, one should expect a global consistent solution when considering the whole set of conditions. However, the formal proof of the asymptotic properties is not a trivial issue and needs further research, since the partial linear model has censored observations. Alternatively, to check the accuracy of the proposed estimator, in Section 4 we will carry out a simulation study, where the effect of the sample size and the censorship will be studied. To make inference on the estimators, we will propose a bootstrap technique, adapted to the particular dataset that will be analysed. 3. INFERENCE USING BOOTSTRAP There are two main bootstrap resample methods for censored data in homogeneous models: Reid (1981) and Efron (1981), which have been compared in Akritas (1986). Since, in our case, we have covariates, we propose a new procedure to generate the bootstrap samples for the case of random

5 Censored partial regression 113 censorship and a heterogeneous model, that is more related with Efron s proposal. This procedure does not assume any particular relation between the censoring variable and the covariates, being therefore very flexible. We summarize it in the following steps: step 1: fit model (1) as described in Section 2 and compute its residuals; step 2: resample the centered residuals to obtain the bootstrap sample ɛ1,...,ɛ n ; step 3: generate ln Ti = X i ˆβ + (Nĥ(R)) i + ɛi ; for i = 1,...,n; step 4: generate a vector of Bernoulli variables δ, where P(δi = 1 ln Ti = ln ti, X i = x i ) = 1 Ĝ n (ln ti ), for i = 1,...,n, and obtain the bootstrap indicator of censoring. Here, Ĝ n is the Kaplan Meier estimator of the distribution function for the censoring variable; step 5: generate the censoring variable. If ln T = ln t and δ = 1, C is taken from Ĝ n restricted to [ln t, ), while if ln T = ln t and δ = 0, C is taken from Ĝ n restricted to [0, ln t ); step 6: Estimate model (1), for the bootstrap sample, using the same estimation procedure as in step 1. That is: min ni=1 β,h Win [ln Y (i) X [i]β h(r [i] )] 2 + α [ h (r) ] 2 dr; step 7: go back to step 2 and repeat the process M times (i.e. M bootstrap samples are obtained). Notice that, in step 2, the residuals bootstrap sample has been obtained using the Kaplan Meier estimator for the c.d.f. of the residuals. In step 5, C has distribution Ĝ n. Thus, no obvious restrictions have been imposed on the joint distribution of the covariates and C and, in particular, independence is not required. In practice, the number M depends on the objective of the study. To estimate the distribution of the estimators or confidence intervals, a large value as M = 1000 should be used. To obtain standard deviations, far lower values are sufficient. For more details about bootstrap procedures see, for example, Davison and Hinkley (1997) or Efron and Tibshirani (1993). 4. SIMULATION STUDIES In this section we proceed to empirically study the properties of the proposed model under different sample sizes and censoring levels. Furthermore, we will be particularly interested in the effect of the censoring mechanism on the estimation of the parametric and nonparametric components of our model. We will specifically study the performance of the estimator under three different structures. In the first two cases, the main hypotheses of the model hold, and in the third one, they do not apply. The latter can represent some particular situations that may arise in practice, as in the case of the application considered in Section 5. However, this really represents the worst scenario for the purpose of illustration. Case 1. In cases 1 and 2, we have generated a duration variable with log-normal probability distribution, following the model ln T = 2 + 1X 1 + 3X 2 + e sinx 3 + ɛ, with ɛ N(0,σ = 0.5), (5) where X 1 U[0, 3] and X 2 U[0, 1] are the explanatory variables. For the nonparametric component, we use the function e sinx 3,afunction with two peaks and one valley, where X 3 has been obtained randomly generating equally likely integer values, between 0 and 10. The censoring variable is considered to be uniform, independent from the duration and from the explanatory variables, a usual assumption considered in other simulation studies (see e.g. Jones (1997) and Stute (1999)). Case 2. We now allow for possible dependence between the censoring variable and the covariates, but so that (H.2) still holds. The duration variable is generated as in case 1. Model (5) enables us to generate the censoring and the indicator variables using an algorithm based on steps 4 and 5 of the bootstrap procedure (Section 3), that puts no restriction on the joint distribution of the covariates and the censoring variable. We consider three levels of censoring, 10%, 20% and 33%, two different sample sizes, n = 100 and n = 200 and, for each combination, we generate 1000 samples. We now estimate the censored partial linear model, where the parameters estimated are the coefficients of X 1 and X 2, and the nonparametric

6 114 J. ORBE ET AL. Table 1. β 1 and β 2 coefficients estimation: case 1 Censoring level β 1 β 2 % Mean Variance Mean Variance MSE (total) n = n = Table 2. β 1 and β 2 coefficients estimation: case 2 Censoring level β 1 β 2 % Mean Variance Mean Variance MSE (total) n = n = part will account for the term f (X 3 ) = 2 + e sin X 3.Table 1 summarizes the results for case 1, providing the estimated means and variances for the parameter estimators in all cases. The results for case 2 are presented in Table 2. The effect of the censoring, as expected, tends to increase the variance of the estimators, losing precision as the censoring level increases. In addition, and also as expected, the precision is improved as the sample size increases. For the nonparametric part, we use the measurement error ME = (1/11) 11 [ i=1 f (x3i ) f (x 3i ) ] 2 at the fixed knots x3i ={0, 1,...,10}, where the function is actually estimated. The resulting values are presented using box plots (one for each censoring level) for n = 100 (Figure 1), and for n = 200 (Figure 2). The left panels in each figure present the results for case 1 and the right panels present the results for case 2. In each panel, we have the results for the different censoring levels considered: in the left box we have a 10% censoring level, in the middle box, 20%, and in the right box, 33%. Moreover, since the nonparametric function can estimate the function at any point between 0 and 10, we illustrate the estimation for the whole interval (see Figure 3) by plotting the true function together with the confidence intervals computed for case 2, for samples sizes n = 100 and n = 200, and a censoring level of 20%. As expected, the confidence interval becomes narrower when the sample size increases. We have also simulated cases for n = 500 and 1000, and the results behave as expected (figures are not shown). Similar results appear when we estimated the model in case 1. Case 3. Finally, we want to check the effect of removing some assumptions in our methodology. For an illustration, we have simulated data from a linear model where the duration variable follows a log-normal probability distribution: that is, ln T = θ 0 + θ 1 X + θ 2 Z + ɛ, (6) where X U[0, 1] and Z U[0, 1] are the explanatory variables and ɛ N(0,σ = 0.5). The true values for the parameters are θ 0 = 2, θ 1 = θ 2 = 0. The censoring variable is calculated as C =

7 Censored partial regression 115 ME Case 1 Case 2 Censorship 10%, 20% and 33% Fig. 1. Measurement Error (ME) for the nonparametric component (n = 100). ME Case 1 Case 2 Censorship 10%, 20% and 33% Fig. 2. Measurement Error (ME) for the nonparametric component (n = 200). 1 X, and this relation will imply that the censorship will be concentrated at the observations where X is close to one. In this case, the independence between the censoring and the duration variable does not hold and, therefore, the parametric estimators proposed under this assumption might lead to inconsistent estimators. There are different estimators that account for a weaker hypothesis where the assumption is the independence between the distributions of the censoring and the duration, but when conditioned to the covariates (Ritov, 1990). Unfortunately, the consistency of the estimators in this type of models is only proved in a framework where the observations of the dependent variable lie in a interval truncated from above. This fact can lead to inconsistent estimators when they are related with the whole distribution (e.g. duration mean). In this particular case, since the censoring is concentrated at the end of the study, we expect that the local character of our nonparametric estimation should avoid global errors in the parameter estimators. In order to provide an intuitive argument for this, note that, in the backfitting procedure, if the

8 116 J. ORBE ET AL X 3 95% C.I. (n=200 ) 95% C.I. (n=100 ) Fig. 3. Nonparametric component (case 2 with 20% of censorship). Table 3. Estimations in the partial linear model θ 2 Sample size Mean SD nonparametric term is removed, the parameter estimators are consistent. For the nonparametric estimator at each point, we just use the closer observations and this avoids the use of the information close to the end of study for this covariate. That is, the parametric part uses the whole set of information and the nonparametric part the local, which will lead to unreliable results only at the end of the study and for this specific covariate. However, it should lead to reliable global estimators for the rest of the covariates. To test for this, four different sample sizes are considered, n = 100, 500, 1000, 2000 and, for each sample size, we generate 1000 samples. The censorship level is kept fixed at 15% for all cases. We now estimate the partial regression model ln T = f (X) + θ 2 Z + ɛ. The results for the parametric component of the model are presented in Table 3, and Figure 4 shows the 95% empirical confidence bands for the estimation of the nonparametric part of the model, f (X), for n = From Table 3, we can see that the estimation for the parameter θ 2 in the model has a bias and a variance that decrease as the sample size increases, which is a good sign for the estimation process. As we can see from Figure 4, the resulting confidence bands for the nonparametric estimate covers the true value f (X) = 2. As expected, the nonparametric estimator should not be interpreted at the end of the study. We also have computed the pure parametric model (6) using Stute s method. Table 4 presents the results where we can observe a clear bias in the estimators. f(x3 ) 5. APPLICATION: STUDY OF THE SURVIVAL TIME FOR AIDS PATIENTS The dataset contains information about the survival time for AIDS-diagnosed patients who lived in the Basque Autonomous Community and the Autonomous Community of Navarra in Spain. We have a

9 Censored partial regression % C.I. (n=1000 ) X Fig. 4. Nonparametric component (case 3 with 15% of censorship). Table 4. Estimations in the fully parametric model θ 0 θ 1 θ 2 Sample size Mean SD Mean SD Mean SD sample of 461 patients diagnosed with AIDS from 1984 until December 31, The duration variable under study measures the survival time for the patient from the illness diagnosis time up to death or up to the end of the study (censored observations). The evolution of the HIV virus has three stages. The first one is known as the pre-antibody stage and it is the shortest one with a duration of only several months and goes from the infection date to the development of antibodies or seroconversion point. The second stage, the incubation stage, is the largest one, starting with the seroconversion (the patient is classified as seropositive) and goes until the diagnosis of AIDS. Finally, the third stage gives the survival time from the AIDS diagnosis time, when the individual develops some of the illnesses related to AIDS, up to death or up to the end of the study. We study the duration in the third stage. As of December 31, 1992, there were 447 patients that had died and 14 patients that had survived, the censored observations. The covariates are: sex (Sex), age at diagnosis (Age), transmission category, disease at AIDS diagnosis and the period of diagnosis (Period). The transmission category is coded using five dummy variables for T-Sex, T-Drug, T-Blood, T-Moth-child and T-Others. The disease at AIDS diagnosis is coded using three dummy variables Disease1, diagnosis through an opportunistic infection; Disease2, diagnosis produced by a Kaposi s sarcoma or some lymphoma; and Disease3, diagnosis through an HIV encephalopathy or a HIV wasting syndrome. For the effect of the AZT treatment (starting in the middle of 1987), we introduce the variable R which indicates the period of diagnosis (in quarters). Thus, this variable takes values from one (diagnosis in the second quarter of 1984) to 27 (diagnosis in the last quarter of 1990). Since the variable period of diagnosis has a clear relation with the censoring variable (i.e. C = t 0 R, where t 0 = 35, 27 quarters under study and eight under follow-up), we are in a situation similar to case 3 f (X)

10 118 J. ORBE ET AL. Table 5. Estimates of β and 95% confidence intervals (Censored Partial Regression Model) Variable Coef SD LL UL Constant Sex Disease Disease T-Sex T-Drug T-Blood T-Moth-child Age presented in Section 4 and this will affect to parametric specifications of this variable. This fact, together with the issue of considering a possible gradual effect for the AZT motivates to propose the estimation of a partial regression model in Section 2, where all covariates are introduced parametrically, except the period of diagnosis, which is introduced through a nonparametric term. As indicated in the previous section, the partial estimator should only be affected at the end of the range of the variable R. However, since one of the main interests in the application was the study of the effect of AZT, introduced in 1987, on the survival of the patients (i.e. r = in the partial linear model), corresponding to the mid-part of the study, the interpretation of the results can be done in a reliable manner. As for the inference on the estimators, since we know the relation between the censoring variable and one of the explanatory variables (i.e. period of diagnosis) we have changed steps 4 and 5 in the bootstrap procedure described in Section 3, to reproduce this relation in the resamples. The new steps 4 and 5 will calculate the censoring indicator using δi = I (ti c i ), where c i = t 0 r i, t 0 = 35 quarters and r i is the period of diagnosis. With this procedure, we calculate standard deviations as ˆσ ˆβ k = (M 1) 1 M i=1 [ i βˆ k M 1 M j βˆ k j=1 ] 2 where ˆ β k i indicates the estimated βk in the ith bootstrap sample. The confidence intervals are computed using the percentile method (Efron, 1982), denoting by LL and UL the lower and upper limit respectively. Table 5 reports the results for the parametric component, and Figure 5 those for the nonparametric component. With regard to the covariates introduced in the parametric component, only the age of the patient has a significant, negative, effect on his/her survival time. As for the estimation of the nonparametric component, the period of diagnosis does not seem to have a significant effect at the beginning of the study. As time goes by, a significant and positive effect appears with a clear acceleration several quarters before the beginning of the administration of AZT (started in the middle of 1987, i.e. r = 13). This makes sense because patients whose diagnosis time was several quarters before starting the administration of AZT also received this treatment. Therefore, we can say that the introduction of AZT has a positive effect on the survival, increasing the survival time of patients. In order to better illustrate this, we can look at Figure 5 and consider the rest of the covariates to remain constant. We can observe that the difference, in terms of mean of ln T, for a patient diagnosed of AIDS at the beginning of the illness (i.e. r = 1, last quarter of 1984) and a patient, for example, diagnosed with AIDS just after starting the introduction of 1/2,

11 Censored partial regression % C.I R Fig. 5. Estimation of the nonparametric component (application). This figure shows the function h(r), that represents the nonparametric estimation of the effect of the period covariate on the mean log-duration, together with the 95% confidence bands. From the figure we can observe a positive and gradual effect of the AZT treatment (that started in the middle of 1987, i.e. R = 13). The increase of the mean log-duration of patients is clear and it can be directly quantified by comparing the values in ordinates. This effect can not be easily captured using parametric specifications. AZT (i.e. r = 13) is larger than 1. That is, the difference in mean survival time between those two periods increases approximately in almost four quarters. 6. DISCUSSION As a summary, this paper proposes a censored partial regression model to analyse a response variable and the effect of some covariates under censored observations. The proposal is based on a distributionfree model, where the proportionality of the hazard functions is not required, and it allows us to model situations where the effect of one or more covariates is not parametrically specified. In addition, we model directly the effect of the covariates on the duration, which leads to an easier interpretation of the results. The simulation study indicates that this procedure produces good estimates for the parametric component and for the nonparametric one, even in situations where the hypotheses of the model do not hold. A new bootstrap technique is derived to make inference on the estimators. Finally, the application of the methodology to the dataset allows us to interpret the gradual effect of the AZT treatment, that cannot be easily capture using parametric specifications. This analysis shows the evolution of the survival time for the illness during the period under study. In the first quarters, the effect of the period of diagnosis on the survival time is small and, as time goes by, this effect increases and presents a strong acceleration. This can be due to the fact that AIDS was quite unknown at the beginning and, then, as it became widely known, the disease was diagnosed earlier. h (R) ACKNOWLEDGEMENTS The authors would like to thank Winfried Stute for valuable comments and discussion. This work was partially supported by Universidad del País Vasco/Euskal Herriko Unibertsitatea (UPV/EHU), Dirección General de Enseñanza Superior e Investigación Científica del Ministerio Español de Educación y Cultura

12 120 J. ORBE ET AL. and Gobierno Vasco under research grants UPV HA129/99, UPV /2001, PB , PI and PI REFERENCES AKRITAS, M. G.(1986). Bootstrapping the Kaplan Meier estimator. Journal of the American Statistical Association 81, BUCKLEY, J.J.(1979). Linear regression with censored data. Biometrika 66, COX, D. R.(1972). Regression models and life-tables. Journal of the Royal Statistical Society, Series B 34, DAVISON, A. C. AND HINKLEY, D. V.(1997). Bootstrap Methods and Their Application. Cambridge: Cambridge University Press. EFRON, B.(1981). Censored data and bootstrap. Journal of the American Statistical Association 76, EFRON, B.(1982). The Jackknife, the Bootstrap and Other Resampling Plans, CBMS-NSF 38. Philadelphia: SIAM. EFRON, B.AND TIBSHIRANI, R.J.(1993). An Introduction to the Bootstrap. New York: Chapman and Hall. GRAY, R. J.(1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal of the American Statistical Association 87, GRAY, R. J.(2000). Estimation of the regression parameters and the hazard function in transformed linear survival models. Biometrics 56, GREEN, P. J. AND SILVERMAN, B. W.(1994). Nonparametric Regression and Generalized Linear Models. London: Chapman and Hall. HASTIE, T. AND TIBSHIRANI, R.(1993). Varying-coefficient models. Journal of The Royal Statistical Society, Series B 55, HECKMAN, N.E.(1986). Spline smoothing in a partly linear model. Journal of The Royal Statistical Society, Series B 48, INTRATOR, O. AND KOOPERBERG, C.(1995). Trees and splines in survival analysis. Technical Report. University of Washington. JIN, D.Y.,YING, Z.AND WEI, L.J.(2001). A simple resampling method by perturbing the minimand. Biometrika 88, JONES, M. P.(1997). A class of semiparametric regressions for the accelerated failure time model. Biometrika 84, KAPLAN, E. L. AND MEIER, P.(1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53, KOOPERBERG, C., STONE, C. J. AND TRUONG, Y. K.(1995). Hazard regression. Journal of the American Statistical Association 90, KOUL, H., SUSARLA, V. AND VAN-RYZIN, J.(1981). Regression analysis with randomly right-censored data. The Annals of Statistics 9, LAWLESS, J.F.(1982). Statistical Models and Methods for Lifetime Data. New York: Wiley. LEBLANC, M.AND CROWLEY, J.(1999). Adaptive regression splines in the Cox model. Biometrics 55, LIN, D. Y., WEI, L. J. AND YING, Z.(1998). Accelerated failure time models for counting processes. Biometrika 85, MILLER, R.G.(1976). Least squares regression with censored data. Biometrika 63,

13 Censored partial regression 121 ORBE, J., FERNÁNDEZ, A. AND NÚÑEZ-ANTÓN, V. (1996). Análisis de la supervivencia en enfermos de SIDA residentes en la Comunidad Autónoma del País Vasco y Navarra. Osasunkaria 12, 1 8. REID, N.(1981). Estimating the median survival time. Biometrika 68, RICE, J.(1986). Convergence rates for partially splined models. Statistics and Probability Letters 4, RITOV, Y.(1990). Estimation in a linear regression model with censored data. The Annals of Statistics 18, SPECKMAN, P.(1988). Kernel smoothing in partial linear models. Journal of the Royal Statistical Society, Series B 50, STUTE, W. (1993). Consistent estimation under random censorship when covariables are present. Journal of Multivariate Analysis 45, STUTE, W.(1996). Distributional convergence under random censorship when covariables are present. Scandinavian Journal of Statistics 23, STUTE, W.(1999). Nonlinear censored regression. Statistica Sinica 9, TSIATIS, A. A.(1990). Estimating regression parameters using linear rank tests for censored data. The Annals of Statistics 18, WEI, L. J., YING, Z. AND LIN, D. Y.(1990). Linear regression analysis of censored survival data based on rank tests. Biometrika 77, [Received May 8, 2001; first revision August 8, 2001; second revision January 2, 2002; accepted for publication February 1, 2002]

AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

DOCUMENTOS DE TRABAJO BILTOKI. Benchmarking of patents: An application of GAM methodology. Petr Mariel y Susan Orbe

DOCUMENTOS DE TRABAJO BILTOKI. Benchmarking of patents: An application of GAM methodology. Petr Mariel y Susan Orbe DOCUMENTOS DE TRABAJO BILTOKI D.T. 2007.02 Benchmarking of patents: An application of methodology. Petr Mariel y Susan Orbe Facultad de Ciencias Económicas. Avda. Lehendakari Aguirre, 83 48015 BILBAO.

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Generalized Additive Models

Generalized Additive Models By Trevor Hastie and R. Tibshirani Regression models play an important role in many applied settings, by enabling predictive analysis, revealing classification rules, and providing data-analytic tools

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 37 54 PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY Authors: Germán Aneiros-Pérez Departamento de Matemáticas,

More information

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. *

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. * Least Absolute Deviations Estimation for the Accelerated Failure Time Model Jian Huang 1,2, Shuangge Ma 3, and Huiliang Xie 1 1 Department of Statistics and Actuarial Science, and 2 Program in Public Health

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Right censored

More information

Modelling Survival Data using Generalized Additive Models with Flexible Link

Modelling Survival Data using Generalized Additive Models with Flexible Link Modelling Survival Data using Generalized Additive Models with Flexible Link Ana L. Papoila 1 and Cristina S. Rocha 2 1 Faculdade de Ciências Médicas, Dep. de Bioestatística e Informática, Universidade

More information

Empirical Likelihood in Survival Analysis

Empirical Likelihood in Survival Analysis Empirical Likelihood in Survival Analysis Gang Li 1, Runze Li 2, and Mai Zhou 3 1 Department of Biostatistics, University of California, Los Angeles, CA 90095 vli@ucla.edu 2 Department of Statistics, The

More information

Reduced-rank hazard regression

Reduced-rank hazard regression Chapter 2 Reduced-rank hazard regression Abstract The Cox proportional hazards model is the most common method to analyze survival data. However, the proportional hazards assumption might not hold. The

More information

Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data

Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 22 11-2014 Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data Jayanthi Arasan University Putra Malaysia,

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississippi University, MS38677 K-sample location test, Koziol-Green

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann

More information

Censored quantile regression with varying coefficients

Censored quantile regression with varying coefficients Title Censored quantile regression with varying coefficients Author(s) Yin, G; Zeng, D; Li, H Citation Statistica Sinica, 2013, p. 1-24 Issued Date 2013 URL http://hdl.handle.net/10722/189454 Rights Statistica

More information

Exam C Solutions Spring 2005

Exam C Solutions Spring 2005 Exam C Solutions Spring 005 Question # The CDF is F( x) = 4 ( + x) Observation (x) F(x) compare to: Maximum difference 0. 0.58 0, 0. 0.58 0.7 0.880 0., 0.4 0.680 0.9 0.93 0.4, 0.6 0.53. 0.949 0.6, 0.8

More information

EMPIRICAL LIKELIHOOD ANALYSIS FOR THE HETEROSCEDASTIC ACCELERATED FAILURE TIME MODEL

EMPIRICAL LIKELIHOOD ANALYSIS FOR THE HETEROSCEDASTIC ACCELERATED FAILURE TIME MODEL Statistica Sinica 22 (2012), 295-316 doi:http://dx.doi.org/10.5705/ss.2010.190 EMPIRICAL LIKELIHOOD ANALYSIS FOR THE HETEROSCEDASTIC ACCELERATED FAILURE TIME MODEL Mai Zhou 1, Mi-Ok Kim 2, and Arne C.

More information

Analytical Bootstrap Methods for Censored Data

Analytical Bootstrap Methods for Censored Data JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 6(2, 129 141 Copyright c 2002, Lawrence Erlbaum Associates, Inc. Analytical Bootstrap Methods for Censored Data ALAN D. HUTSON Division of Biostatistics,

More information

STAT Section 2.1: Basic Inference. Basic Definitions

STAT Section 2.1: Basic Inference. Basic Definitions STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be or even. Sample: A collection of elements of the population.

More information

A Bivariate Weibull Regression Model

A Bivariate Weibull Regression Model c Heldermann Verlag Economic Quality Control ISSN 0940-5151 Vol 20 (2005), No. 1, 1 A Bivariate Weibull Regression Model David D. Hanagal Abstract: In this paper, we propose a new bivariate Weibull regression

More information

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Chanseok Park 1 Department of Mathematical Sciences, Clemson University, Clemson, SC 29634 Short Title: Smooth

More information

Likelihood Construction, Inference for Parametric Survival Distributions

Likelihood Construction, Inference for Parametric Survival Distributions Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make

More information

ST745: Survival Analysis: Nonparametric methods

ST745: Survival Analysis: Nonparametric methods ST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University February 5, 2015 The KM estimator is used ubiquitously in medical studies to estimate

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia

More information

Published online: 10 Apr 2012.

Published online: 10 Apr 2012. This article was downloaded by: Columbia University] On: 23 March 215, At: 12:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Distributions, Hazard Functions, Cumulative Hazards BIO 244: Unit 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution

More information

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Patrick J. Heagerty PhD Department of Biostatistics University of Washington 166 ISCB 2010 Session Four Outline Examples

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities Hutson Journal of Statistical Distributions and Applications (26 3:9 DOI.86/s4488-6-47-y RESEARCH Open Access Nonparametric rank based estimation of bivariate densities given censored data conditional

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

More information

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive

More information

On consistency of Kendall s tau under censoring

On consistency of Kendall s tau under censoring Biometria (28), 95, 4,pp. 997 11 C 28 Biometria Trust Printed in Great Britain doi: 1.193/biomet/asn37 Advance Access publication 17 September 28 On consistency of Kendall s tau under censoring BY DAVID

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests Biometrika (2014),,, pp. 1 13 C 2014 Biometrika Trust Printed in Great Britain Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests BY M. ZHOU Department of Statistics, University

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0, Accelerated failure time model: log T = β T Z + ɛ β estimation: solve where S n ( β) = n i=1 { Zi Z(u; β) } dn i (ue βzi ) = 0, Z(u; β) = j Z j Y j (ue βz j) j Y j (ue βz j) How do we show the asymptotics

More information

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 Rejoinder Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 1 School of Statistics, University of Minnesota 2 LPMC and Department of Statistics, Nankai University, China We thank the editor Professor David

More information

LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE ACCELERATED FAILURE TIME MODEL

LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE ACCELERATED FAILURE TIME MODEL Statistica Sinica 17(2007), 1533-1548 LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE ACCELERATED FAILURE TIME MODEL Jian Huang 1, Shuangge Ma 2 and Huiliang Xie 1 1 University of Iowa and 2 Yale University

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates

More information

Product-limit estimators of the survival function with left or right censored data

Product-limit estimators of the survival function with left or right censored data Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut

More information

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors Peter Hall, Qi Li, Jeff Racine 1 Introduction Nonparametric techniques robust to functional form specification.

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

* * * * * * * * * * * * * * * ** * **

* * * * * * * * * * * * * * * ** * ** Generalized Additive Models Trevor Hastie y and Robert Tibshirani z Department of Statistics and Division of Biostatistics Stanford University 12th May, 1995 Regression models play an important role in

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

The Log-generalized inverse Weibull Regression Model

The Log-generalized inverse Weibull Regression Model The Log-generalized inverse Weibull Regression Model Felipe R. S. de Gusmão Universidade Federal Rural de Pernambuco Cintia M. L. Ferreira Universidade Federal Rural de Pernambuco Sílvio F. A. X. Júnior

More information

Semi-parametric estimation of non-stationary Pickands functions

Semi-parametric estimation of non-stationary Pickands functions Semi-parametric estimation of non-stationary Pickands functions Linda Mhalla 1 Joint work with: Valérie Chavez-Demoulin 2 and Philippe Naveau 3 1 Geneva School of Economics and Management, University of

More information

Constrained estimation for binary and survival data

Constrained estimation for binary and survival data Constrained estimation for binary and survival data Jeremy M. G. Taylor Yong Seok Park John D. Kalbfleisch Biostatistics, University of Michigan May, 2010 () Constrained estimation May, 2010 1 / 43 Outline

More information

On Estimation of Partially Linear Transformation. Models

On Estimation of Partially Linear Transformation. Models On Estimation of Partially Linear Transformation Models Wenbin Lu and Hao Helen Zhang Authors Footnote: Wenbin Lu is Associate Professor (E-mail: wlu4@stat.ncsu.edu) and Hao Helen Zhang is Associate Professor

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

β j = coefficient of x j in the model; β = ( β1, β2,

β j = coefficient of x j in the model; β = ( β1, β2, Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)

More information

Pricing and Risk Analysis of a Long-Term Care Insurance Contract in a non-markov Multi-State Model

Pricing and Risk Analysis of a Long-Term Care Insurance Contract in a non-markov Multi-State Model Pricing and Risk Analysis of a Long-Term Care Insurance Contract in a non-markov Multi-State Model Quentin Guibert Univ Lyon, Université Claude Bernard Lyon 1, ISFA, Laboratoire SAF EA2429, F-69366, Lyon,

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Analysis of Type-II Progressively Hybrid Censored Data

Analysis of Type-II Progressively Hybrid Censored Data Analysis of Type-II Progressively Hybrid Censored Data Debasis Kundu & Avijit Joarder Abstract The mixture of Type-I and Type-II censoring schemes, called the hybrid censoring scheme is quite common in

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information

Accelerated Failure Time Models: A Review

Accelerated Failure Time Models: A Review International Journal of Performability Engineering, Vol. 10, No. 01, 2014, pp.23-29. RAMS Consultants Printed in India Accelerated Failure Time Models: A Review JEAN-FRANÇOIS DUPUY * IRMAR/INSA of Rennes,

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 248 Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Kelly

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Finite Population Correction Methods

Finite Population Correction Methods Finite Population Correction Methods Moses Obiri May 5, 2017 Contents 1 Introduction 1 2 Normal-based Confidence Interval 2 3 Bootstrap Confidence Interval 3 4 Finite Population Bootstrap Sampling 5 4.1

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Bootstrapping Australian inbound tourism

Bootstrapping Australian inbound tourism 19th International Congress on Modelling and Simulation, Perth, Australia, 12 16 December 2011 http://mssanz.org.au/modsim2011 Bootstrapping Australian inbound tourism Y.H. Cheunga and G. Yapa a School

More information

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements [Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers

More information

Frailty Models and Copulas: Similarities and Differences

Frailty Models and Copulas: Similarities and Differences Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters Communications for Statistical Applications and Methods 2017, Vol. 24, No. 5, 519 531 https://doi.org/10.5351/csam.2017.24.5.519 Print ISSN 2287-7843 / Online ISSN 2383-4757 Goodness-of-fit tests for randomly

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Generated Covariates in Nonparametric Estimation: A Short Review.

Generated Covariates in Nonparametric Estimation: A Short Review. Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated

More information

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In

More information

SEMIPARAMETRIC REGRESSION WITH TIME-DEPENDENT COEFFICIENTS FOR FAILURE TIME DATA ANALYSIS

SEMIPARAMETRIC REGRESSION WITH TIME-DEPENDENT COEFFICIENTS FOR FAILURE TIME DATA ANALYSIS Statistica Sinica 2 (21), 853-869 SEMIPARAMETRIC REGRESSION WITH TIME-DEPENDENT COEFFICIENTS FOR FAILURE TIME DATA ANALYSIS Zhangsheng Yu and Xihong Lin Indiana University and Harvard School of Public

More information

A comparison study of the nonparametric tests based on the empirical distributions

A comparison study of the nonparametric tests based on the empirical distributions 통계연구 (2015), 제 20 권제 3 호, 1-12 A comparison study of the nonparametric tests based on the empirical distributions Hyo-Il Park 1) Abstract In this study, we propose a nonparametric test based on the empirical

More information

Package Rsurrogate. October 20, 2016

Package Rsurrogate. October 20, 2016 Type Package Package Rsurrogate October 20, 2016 Title Robust Estimation of the Proportion of Treatment Effect Explained by Surrogate Marker Information Version 2.0 Date 2016-10-19 Author Layla Parast

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information