Quantile Regression for Doubly Censored Data with Applications to Cystic Fibrosis Studies. Technical Report July 1, 2010

Size: px
Start display at page:

Download "Quantile Regression for Doubly Censored Data with Applications to Cystic Fibrosis Studies. Technical Report July 1, 2010"

Transcription

1 Quantile Regression for Doubly Censored Data with Applications to Cystic Fibrosis Studies by Shuang Ji 1, Limin Peng 1, Yu Cheng 2, and HuiChuan Lai 3 Technical Report 1-3 July 1, 21 1 Department of Biostatistics and Bioinformatics Rollins School of Public Health 1518 Clifton Road, N.E. Emory University Atlanta, Georgia 3322, U.S.A. 2 Department of Statistics University of Pittsburgh Pittsburgh, PA 1526, U.S.A. 3 Departments of Nutritional Sciences and Biostatistics and Medical Informatics University of Wisconsin-Madison Madison, WI 5376, U.S.A. Telephone: (44) FAX: (44) lpeng@emory.edu

2 Quantile Regression for Doubly Censored Data with Applications to Cystic Fibrosis Studies Shuang Ji 1, Limin Peng 1,, Yu Cheng 2, and HuiChuan Lai 3 1 Department of Biostatistics and Bioinformatics Rollins School of Public Health, Emory University Atlanta, Georgia 3322, U.S.A. 2 Department of Statistics, University of Pittsburgh, Pittsburgh, PA 1526, U.S.A. 3 Departments of Nutritional Sciences and Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 5376, U.S.A. July 1, 21 Abstract Quantile regression is known for its flexibility to accommodate varying covariate effects and has attracted growing interest in its application to survival analysis. Motivated by Peng and Huang (28) s work on quantile regression method with randomly right censored data, we develop a quantile regression method tailored for a double censoring setting that is often encountered in registry studies utilizing the embedded martingale structure. The proposed estimation and inference procedures are computationally simple and stable. We establish the uniform consistency and weak convergence of the resulting estimators. We also provide a sensible solution to address the identifiability issues on regression quantiles at both tails, a unique feature with doubly censored data. The finite-sample performance of our approach is assessed by a series of simulation studies. An application to a registry data on cystic fibrosis illustrates the good practical utility of our method. Key Words: Cystic Fibrosis; Double censoring; Empirical process; Martingale; Pseudomonas aeruginosa; Regression quantile; Varying coefficient. 1 Introduction Double censoring occurs when the measurement of interest is subject to both left censoring and right censoring. It is often encountered in biomedical research. In 1

3 the Cystic Fibrosis Patient Registry (CFFPR) study, the onset time of first pseudomonas aeruginosa (PA) infection in CF patients is of interest. This event time is subject to double censoring since patients may enter the study with PA infection having occurred already, and the follow-up may be terminated before any PA infection is observed. In this study, the left censoring times which correspond to ages at the first visit are always recorded for each patient. However, this may not be true for the right censoring times due to random dropouts. Such doubly censored data, which can also arise from many other situations, is the focus of this paper. It is more challenging to handle doubly censored data than right censored data. Turnbull (1974) studied the one-sample problem using the self-consistent estimators. Gehan (1965) studied the two-sample problem by an extension of the Wilcoxon test. Ren (28) proposed a weighted empirical likelihood-based semiparametric MLE as a unified approach for the two-sample problem with various censoring schemes including double censoring. In this paper we are interested in the regression setting. Among existing work on regression models for doubly censored data, Zhang and Li (1996) proposed the elegant Buckey-James-Ritvo-type M-estimator. However, computational issues may arise because the estimating equation is neither monotone nor continuous. Ren and Gu (1997) and Ren (23) proposed a parallel regression M-estimator. This approach requires the marginal independence of the survival time and the censoring time, which may be restrictive in practice. Assuming both the left censoring time and right censoring time are always observed, Cai and Cheng (24) adopted semiparametric transformation models which can be viewed as generalization of the proportional hazards models. Yan et al. (29) adapted temporal process regression (Fine, Yan and Kosorok, 24) to doubly censored data. These methods may have limited application to the CFFPR data. We plan to relax the assumptions mentioned above and propose a methodology tailored to the CFFPR data. We adopt a quantile regression framework to tackle the double censoring problem. Quantile regression (Koenker and Bassett 1978) has been serving as a popular extension of the classical linear regression model for uncensored data. It is well known for its flexibility of allowing for varying covariate effects and its robustness as a non-parametric method. These features have enhanced utility especially when there is a changing pattern of covariate effects or when heteroscedasticity exists in the data(peng and Huang, 28; Portnoy, 23). Therefore quantile regression has attracted increased attention in survival analysis. For censored data quantile regression model can be viewed as a generalization of the traditional AFT model. Most existing work on censored quantile regression has been focused on right censored data. Among the earliest breakthroughs, Powell (1984, 1986) extended the least absolute deviation (LAD) from traditional quantile regression to censored quantile regression, assuming the censoring variables are fixed or always observ- 2

4 able. Subsequent efforts have been made to accommodate random censoring. Ying et al. (1995) modified LAD for median regression models and adopted a semiparametric procedure. Honore, Khan and Powell (22) also adapted the LAD idea to random censoring. Approaches of this type share the assumptions of covariate independent censoring and unconditional independence between survival time and censoring variables. Methods have also be developed under the relatively weak conditional independence between survival time and censoring time given the covariates. Yang (1999) proposed estimators based on weighted empirical survival and hazards functions, which however, may not suit for more general heteroscedastic situations. Without imposing restrictions on the error term, Portnoy (23) proposed a recursively reweighted estimator as a generalization of the Kaplan-Meier estimator. Due to the recursive nature, there are complications in both the computation and the asymptotic results. The asymptotic properties of a gridded version of the estimation procedure are studied in Neocleous, Vanden Branden and Portnoy (26) and Portnoy and Lin (21). Peng and Huang (28) proposed an alternative estimating equation motivated by the martingale structure of right censored data. Their approach is well justified in theory and convenient to implement. Motivated by Peng and Huang (28), we propose a quantile regression model for doubly censored data, assuming conditional independent censoring with known left censoring time and unknown right censoring time. By utilizing the martingale structure associated with doubly censored data we propose a monotone estimating equation for regression parameters and adopt a grid-based estimation procedure. It can be shown that our estimation procedure is equivalent to locating the minima of a sequence of L 1 -type convex functions with unique solutions. This procedure can be reliably implemented by existing functions in R and S-PLUS. In addition, we provide remedies for the potential identifiability issue, which is common and important for censored data, as discussed in Peng and Huang (28). For doubly censoring data it is more challenging, as we can see that both the lower and higher tails of the event time distribution may suffer from non-identifiability. The difficulty here cannot be bypassed by solely restricting the range of τ. To solve this issue, we propose a conditional version of the quantile regression model that is appropriate when the lower tail is not identifiable. We also provide rigorous justifications for all our proposals and establish asymptotic properties for the estimated regression quantiles by using empirical process and stochastic integral techniques. The rest of the article is organized as follows. In section 2 we present the unconditional model, the estimation procedure, asymptotic results and the inference procedure. In section 3 we present the conditional model and its associated properties. In section 4 we briefly describe an extension of our models. In section 5 we report the simulation study results which show good finite sample performance. In 3

5 section 6 we illustrate our proposed method through the CFFPR data analysis. We conclude the article with discussions in section 7. 2 The Unconditional Quantile Regression Model 2.1 Data and Model Let T be the event time, L be the left censoring time, U be the right censoring time, and Z = (1, Z) be a (p + 1) 1 covariate vector. Define X = max(l,min(t, U)), at risk indicator R(t) = I(L t X), and counting process N = I(X t, δ = 1) (Fleming and Harrington 1991), where δ is the censoring indicator defined as δ = 1, if L T U; 2, if T < L; 3, if T > U. We assume that (L, U) T Z. In addition, we denote the conditional CDF of T by F T (t Z) = Pr(T t Z), and the conditional cumulative hazard function of T by Λ T (t Z) = log{1 F T (t Z)}. Let T i, L i, U i, X i, δ i, R i (t) and N i (t) be sample analogs. The observed data consists of n iid replicates of (X, δ, Z, L), denoted by (X i, δ i, Z i, L i ), i = 1,..., n. Define Q T (τ Z) = in f {t : F T (t Z) τ}, so it is the conditional τ th quantile of T given Z. The quantile regression model takes the form (1) Q T (τ Z) = exp{z β(τ)}, τ (, 1), where β(τ) is a vector of unknown coefficients representing covariate effects on Q T (τ Z). By formulating the coefficients as functions of τ, we allow the covariate effects to vary across quantiles of T. We can see that this model reduces to the classic AFT model when all covariate effects are assumed to be constant over τ and the intercept is the τ th quantile of a random variate. Model (1) has been studied by Peng and Huang (28) for data with random right censoring. 2.2 Estimation Procedure We propose an estimating equation by utilizing properties of the martingale associated with the observe data. Define M(t) = N(t) t R(u) dλ T (u Z) and let M i (t) be the sample analog. For right censored data (Fleming and Harrington 1991) it has been established that, under the conditional independent censoring assumption, M i (t) is a martingale. For doubly censored data, however, the at risk indicator R( ) is different. Nonetheless by adopting a similar strategy as for right censored 4

6 data we can still show that M i (t) is a martingale with respective to the filtration σ{n i (u), R i (u + ) : i = 1,, n; u t}, assuming (L i, U i ) T i Z i. The detailed proof is provided in Appendix. It naturally follows that (2) E{M i (t) Z i } =, t. Note that, by adopting a variable transformation inside the integral we have M i (t) = N i (t) = N i (t) t τ I{L i u X i }λ T (u Z i ) du I{L i Q T (v Z i ) X i } dh(v), where H(v) = log(1 v) for v < 1. When evaluated at t = Q T (τ Z i ) = exp{z i β (τ)} where β ( ) is the true parameter vector, M i (t) becomes M i [exp{z i β (τ)}] = N i [exp{z i β (τ)}] τ I[L i exp{z i β (v)} X i ] dh(v) The above equality and (2) naturally lead to our proposed estimating equation (3) n 1 2 S n (β, τ) =, where S n (β, τ) = 1 n i=1 τ Z i (N i [exp{z β(τ)}] I[L i exp{z β(v)} X i ] dh(v)). It is implied by (2) that E{S n (β, τ)} =. Suggested by the stochastic integration representation of S n (β, τ), a grid-based estimation procedure is adopted. Define ˆβ(τ) as a right-continuous step function that jumps only on a grid: G Ln = { = τ < τ 1 < < τ Ln = τ U < 1}. Here we confine the range of τ to be [, τ U ] to accommodate potential non-identifiability on the upper tail due to right censoring. Based on (3), we propose obtaining ˆβ(τ j ), j = 1,, L n, by sequentially solving β(τ j ) from the equation: (4) n 1 2 j 1 Z i (N i [exp{z β(τ j )}] I[L i exp{z ˆβ(τ k )} X i ] i=1 k=1 {H(τ k+1 ) H(τ k )}) =. By the definition of Q T ( Z ) and model (1), exp {Z β ()} =. Therefore we always set exp {Z ˆβ()} =. The estimators ˆβ(τ j )( j = 1,, L n ) are defined as 5

7 generalized solutions (Fygenson and Ritov 1994), since the non-continuity of the estimating function (4) may lead to failure in finding exact solutions. Note that (4) is a monotone estimating equation, and the left-hand side of (4) is 2 times the gradient of the following function (5) l j (h) = I(δ i = 1) log X i h I(δ i = 1)Z i i=1 + R h { I(δ l = 1)Z l } l=1 + R h (2Z r j 1 r=1 k=1 {H(τ k+1 ) H(τ k )}), I[L r exp{z r ˆβ(τ k )} X r ] where R is a very large number. Therefore ˆβ(τ j ) can be obtained as the minimizer of the L 1 -type function l j (h). The minimization can be performed using the Barrodale-Roberts algorithm (Barrodale and Roberts 1974) which is implemented in standard statistical software including S-PLUS and R. 2.3 Asymptotic Results The uniform consistency and weak convergence of our proposed estimators can be established by utilizing the stochastic integral representation of the estimation procedure, along the lines of Peng and Huang (28). Under certain regularity conditions C1-C4 (with details stated in Appendix), the following theorems hold. Theorem 1. Assuming conditions C1-C4 hold and lim n S Ln =, then the uniform consistency holds: sup τ [ν,τu ] ˆβ(τ) β (τ) p, where < ν < τ U. Theorem 2. Assuming conditions C1-C4 hold and lim n n 1/2 S Ln =, then n 1/2 ˆβ(τ) β (τ) converges weakly to a Gaussian process for τ [ν, τ U ], where < ν < τ U. We need to point out that the regularity condition C4 (see Appendix B) corresponds to the identifiability of β (τ), τ (, τ U ]. In the simple one-sample case, this condition is equivalent to f T [exp{β (τ)}]pr[l < exp{β (τ)} U] >, ν (, τ U ]. Assuming f T ( ) is bounded away from, then the above condition reduces to Pr{L < Q T (τ) U} >, ν (, τ U ], which implies τ U < F T (U + ) and ν > F T (L ), with U + and L representing the upper bound of the support of U and the lower bound of the support of L, respectively. Since ν > is arbitrary, we need 6

8 L T, where T is the lower bound of the support of T. For the regression setting, however, we only have implicit conditions on L and τ U to guarantee the identifiability. The proof of Theorem 1 can be easily adapted from Peng and Huang (28), since the two essential elements in their proof are preserved in our case: the estimating function has expectation when evaluated at the true parameter, and the boundary condition ˆβ() =. Therefore we omit the proof in this paper. The proof of Theorem 2 is sketched in Appendix. 2.4 Inferences The estimation of the covariance matrix of ˆβ(τ) is complicated by the unknown density function of X. Hence we adopt the resampling approach proposed by Peng and Huang (28) that generalizes the minimand perturbing technique (Jin et al. 21). Specifically, the objective function (5) is perturbed by ξ 1,, ξ n, a set of i.i.d. variates from a nonnegative known distribution with mean 1 and variance 1, for example, Exp(1). The resulting objective function is (6) l j (h) = ξ i I(δ i = 1) log X i h ξ i I(δ i = 1)Z i i=1 + R h { ξ l I(δ l = 1)Z l } l=1 j 1 + R h (2ξ r Z r I[L r exp{z r ˆβ(τ k )} X r ] r=1 k=1 {H(τ k+1 ) H(τ k )}), j = 1,, L n. The minimizer β (τ j ) of the perturbed function, can be sequentially located with the boundary exp{β ()} set to be. For a fixed τ we can approximate the variance of ˆβ(τ ) by repeatedly generating the variates set {ξ k1,, ξ kn } k=1 B and obtaining the corresponding {β k (τ )} k=1 B. Then the confidence interval for β(τ ) can be constructed using a normal approximation. We can also carry out similar hypothesis testing and second-stage inferences as proposed in Peng and Huang (28). Particularly we are interested in testing (a) whether the effect of a specific covariate Z q (2 q p+1) is significant on a range of τ [l, u], and (b) whether the effect of Z q is constant over τ [l, u]. Consider the general hypothesis H : g{β(τ)} = r (τ), τ [l, u], where the operator function g( ) is often linear and r (τ) is a hypothesized value. Test (a) is a special case of testing H with g(x) = x q and r (τ) = for τ [l, u]. A integral test statistic may be adopted, namely Γ = n 1/2 u {g{ˆβ(v)} r l (v)}θ(v) dv. It can be shown Γ is consistent 7

9 test under certain conditions, where Θ( ) is a nonnegative weight function. The distribution of Γ under H can be approximated using the conditional distribution of the resampling-based test statistic Γ = n 1/2 u [g{β (v)} g{ˆβ(v)}]θ(v) dv. The l constancy test (b) essentially examines H : g{β(τ)} = η, τ [l, u] where g( ) is the same as in (a) and η is an unspecified constant. We may set η to be the average effect ρ = u g{β l (v)} dv/(u l), and ˆρ = u g{ˆβ l (v)} dv/(u l) is a consistent estimator of ρ. The inferences on ρ can be developed using the resamplingbased realizations ρ. To test H we adopt the test statistic Γ = n 1/2 u {g{ˆβ(v)} l ˆρ} Θ(v) dv., where Θ( ) is a nonconstant weight function. The distribution of Γ under H is equivalent to the conditional distribution of Γ = n 1/2 u ([g{β (v)} l g{ˆβ(v)}] [r (v) ˆr(v)]) Θ(v) dv given the observed data. We may reject H ( H ) at level α if the value of Γ ( Γ) is extreme compared to its null distribution, i.e., greater than the (1 α/2)th quantile or less than the (α/2)th quantile of Γ ( Γ ). 3 The Conditional Quantile Regression Model In the previous section the identifiability issue on the upper tail was dealt with, while the lower tail of T was assumed to be identifiable. However, the latter identifiability may not always be guaranteed. For the simple one-sample case, we have established the condition for the identifiability of the lower tail, that is, the lower bound of the support of L is no greater than the lower bound of the support of T. In practice this condition can be violated and hence introduce non-identifiability. For the regression setting we also have an implicit condition, which may not always hold either. When the lower tail is not identifiable, we cannot simply refine the range for τ of interest to [τ L, τ U ] where τ L >. The reason is that, resembling left-truncated data (Tsai et al. 1987), in this case identification of β(τ) is precluded. Motivated by the conditional estimator of the survival function proposed by Tsai et al. (1987), we propose a conditional version of the quantile regression model that formulates the covariate effects on conditional quantiles of T given T > t. In essence we impose an artificial left truncation time t, a relatively early time point, to circumvent the difficulty on identifying the lower tail. The choice of t depends on two aspects: (a) the mathematical condition for identifiability, given in the appendix; and (b) the scientific interest and logic. Let T = T + t, and the conditional model takes the form: (7) Q T (τ Z, T > t ) = exp{z α(τ)} + t, τ (, 1). Here Q T (τ Z, T > t ) is defined as in f {t : F T (t Z, T > t ) τ}, the conditional τ th quantile of T given T > t and Z. Accordingly the coefficient vector α(τ) 8

10 corresponds to covariate effects on the conditional quantiles of T given T > t rather than the marginal quantiles. Model (7) necessitates different estimation procedures than those of the unconditional model. The estimation equation (3) cannot be directly borrowed without modification. First of all, we need to re-define the counting process and the risk set by setting Ň(t) = I(t < X t, δ = 1) and Ř(t) = I(t L t X) in order to reflect the condition T > t. What follows naturally is the modified martingale ˇM(t) = Ň(t) t Ř(u) dλ T (u Z, T > t ) = Ň(t) t I(L t X) dλ T (u Z, T > t ). Note that, due to the complications that arise from the combination of left censoring and artificial left truncation, it is not straightforward to verify ˇM(t) is a martingale. However, we can still show that E{ ˇM(t) Z} = by utilizing the fact that Ň(t) = N(t) N(t ) for t > t together with the established properties of N(t). Specifically, we have shown that E{N(t) Z} = t I(L < u X)λ(u Z) du, which can be further expressed as Λ T (t X Z) Λ T (t L Z). Noting that Λ T ( Z) = log{1 F T ( Z)}, we may arrive at Λ T (t X Z) Λ T (t X Z) = I(X > t )Λ T (t X T > t, Z). From these facts it follows that E{Ň(t)} = t I(L t X) dλ T (u Z, T > t ) and hence E{ ˇM(t) Z} =. The detailed proof is provided in Appendix. Setting t = Q T (τ Z, T > t ), by a variable transformation inside the integral we can show that E{ S n (α, τ)} =, where α denotes the true value of the parameter vector α and S n (α, τ) = 1 n Z i I(X i > t ){N i [exp{z α(τ)} + t ] i=1 τ Therefore we propose an estimating equation (8) n 1 2 S n (α, τ) =. I[L i exp{z α(v)} + t X i ] dh(v)}. Note that we still have the boundary condition exp{z ˆα()} = which is critical in solving α(τ). Similar to the unconditional case, we notice S n (α, τ) has a stochastic integration representation. Therefore we can again utilize the grid-based root-finding strategy and arrive at a monotonic estimating equation whose equiva- 9

11 lent objective function is (9) l j (h) = I(δ i = 1, X i > t ) log(x i t ) I(δ i = 1, X i > t )h Z i i=1 + R h { I(δ l = 1, X l > t )Z l } l=1 + R h (2Z i j 1 r=1 k=1 {H(τ k+1 ) H(τ k )}), I[L r exp{z ˆα(τ k )} + t X r ] where j = 1,, L n, and R is a very large number. The aforementioned Barrodale- Roberts algorithm still applies. Under regularity conditions C1 -C4 parallel to C1-C4 (stated in Appendix XX), we have the following asymptotic results. Theorem 3. Assuming conditions C1 -C4 hold and lim n S Ln =, then sup τ [ν,τu ] ˆα(τ) α (τ) p, where < ν < τ U. Theorem 4. Assuming conditions C1 -C4 hold and lim n n 1/2 S Ln =, then n 1/2 ˆα(τ) α (τ) converges weakly to a Gaussian process for τ [ν, τ U ], where < ν < τ U. To justify Theorems 3-4, we define Z i = Z i I(X i > t ), X i = X i t, L i = L i t. The counting process associated with X i is N i (t) = I(X i t, δ = 1) = I(X i t + t, δ = 1), and we have (1) S n (α, τ) = 1 n Z i {N i (exp[z i α(τ)]) i=1 τ I(L i < exp[z i α(v)] X i ) dh(v)}. Notice that in (1) S n (α, τ) is rewritten as an estimating equation for the unconditional case. Therefore we can utilize the established asymptotic results for the unconditional case and conclude that the uniform consistency (Theorem 3) and the weak convergence (Theorem 4) of ˆα still hold. It is also worth mentioning that the conditions C1 -C4 are essentially C1-C4 corresponding to the new data Z i, X i, L i. A similar resampling-based inference procedure can be adopted here as for the unconditional case. The same perturbation can be applied to (1). We can also carry out similar hypothesis testing and second-stage inferences as in Section 2. 1

12 4 Extension to Doubly Censored Data with Left Truncation In the previous section we proposed a conditional model to accommodate potential non-identifiability on the lower tail of T. Borrowing the idea from the popular technique that deals with right censored data with left truncation, we essentially imposed an artificial left truncation on T. A straightforward extension is that, with slight modification, the proposed model can be applied to even more complicated data that are subject to double censoring and left truncation simultaneously. Specifically, suppose the data described in the previous section are also left truncated at a, a known time point. Let t = t a, then our model is Q T (τ Z, T > t ) = exp{z α(τ)} + t, τ (, 1). Since the above model is exactly the same as (7) except that t is replaced with t, the estimating equation and root finding procedure can be applied without difficulty. 5 Simulation Studies We study the finite-sample performance of our propose methods through Monte- Carlo simulations. Two settings are illustrated: one with homoscedastic errors and one with heteroscedastic errors. In each setting we examine both the unconditional case where the left tail of the event time is totally identifiable and the conditional case where the left tail of the event time is identifiable after a pre-specified time. To guarantee the identifiability in the unconditional case, we impose a positive probability mass P on for the left censoring time L. In the first setting, event times are generated from an AFT model with iid errors: log T = b 1 Z 1 + b 2 Z 2 + ɛ, where ɛ follows the extreme value distribution. It can be shown that under this specific setup the true regression quantiles are the same for the unconditional case and the conditional case given Z = (1, Z 1, Z 2 ), i.e., β (τ) = α (τ) = {Q ɛ (τ), b 1, b 2 }. The covariates Z 1 and Z 2 are generated from Uni f (, 1) and Bernoulli(.5) respectively. The right censoring time U is Uni f (.1 I(Z 2 = 1), c u ), and the left censoring time L is Uni f (, c l ) W, where W follows Bernoulli(1 P ). If U < L in one pair of realization, we regenerate U until U L. In the unconditional case we set P =.2, b 1 =, b 2 =.5, c l =.5, c u = 3.8, resulting in 2% right censoring and 2% left censoring. In the conditional case we set P =, b 1 =, b 2 = 1., 11

13 c l =.3, c u = 4.5, resulting in 15% right censoring and 2% left censoring. Under each configuration we generate 1 data sets of sample sizes n = 2 and n = 4. We set B = 2 in the resampling procedures with the disturbance {ζ i } B i=1 generated from Exp(1). An equally spaced grid on τ with size.1 is adopted when estimating β(τ). We also carry out tests on both the overall significance and the constant effect hypotheses for each covariate. In the latter test we adopt the weight function Θ(v) = I{v (l + u)/2}. We set l =.1 and u =.7. Table 1 present the estimation results for the AFT setting under the unconditional model and the conditional model with the sample size n = 2. Included are the biases (absolute values), empirical standard deviations (EmpSD), average resampling-based SD estimates (AvgSD), and the coverage rates of the 95% Wald confidence intervals for ˆβ(τ) and ˆα(τ), τ =.1,.3,.5,.7, We see the biases are negligible, the resampling-based SD estimates agree well with the empirical SD, and the coverage rates are close to the nominal level. Table 1: Simulation results under AFT models (n=2) Unconditional Case Conditional Case b 1 =, b 2 =-.5, 2% rc, 2% lc b 1 =, b 2 =-1, 15% rc, 2% lc τ Bias AvgSD EmpSD Cov95 Bias AvgSD EmpSD Cov95.1 ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα Table 2 present the hypothesis testing results. The empirical rejection rates (ERR) for both tests at level.5 are reported, together with the estimated average effects (AvgEst), empirical SD of the average effects, and average resamplingbased SD estimates of the average effects. We see that the type I errors are close to the nominal level.5. The estimated average covriate effects of Z 1 and Z 2 are close to the true values (constants). The resampling-based SD estimates agree well with the empirical SD. 12

14 Table 2: Simulation results on hypothesis testing and second-stage inference under AFT models (n=2) Unconditional Case b 1 =, b 2 =-.5, 2% right censoring, 2% left censoring H : β(τ) =, l τ u H : β(τ) = η, l τ u ERR AvgEst AvgSD EmpSD ERR ˆβ ˆβ ˆβ Conditional Case b 1 =, b 2 =-1, 15% right censoring, 2% left censoring H : α(τ) =, l τ u H : α(τ) = η, l τ u ERR AvgEst AvgSD EmpSD ERR ˆα ˆα ˆα In the second setting, we consider a log linear models with heteroscedastic errors. Event times are generated from the model: log T = b 1 Z 1 + b 2 Z 2 ξ + ɛ, where ɛ follows N(, 1), Z 1 follows Uni f (, 1), Z 2 follows Bernoulli(.5), and ξ follows Exp(1). These variates are all independent. The left censoring time and right censoring time are generated in the same way as in the first setting. In the unconditional case we P =.2, b =, b 1 = 1.5, c l =.7, c u = 4.8,, resulting in 25% right censoring and 2% left censoring. In the conditional case we set P =, b =, b 1 = 4.5, c l =.5, c u = 4., resulting in 15% right censoring and 25% left censoring. Tables 3-4 report the estimation results and hypothesis testing results in the similar fashion as in Tables 1-2. We can see that in the heteroscedastic setting our proposed method also performs well. The simulation results for n = 4 for both settings are presented in the supplementary materials. Comparing the two different sample sizes we see that the larger one (4) has better overall performances. We also preformed simulation to demonstrate the advance of our approach over the naive approach that simply discards all left-censored subjects. The same unconditional models and configurations as summarized in Table 1 (AFT model) and Table 3 (heteroscedastic model) are taken for illustration. Figure 1 displays the parameter estimates from the proposed approach and the naive approach together 13

15 Table 3: Simulation results under log-linear models with heteroscedastic errors (n=2) Unconditional Case Conditional Case b 1 =, b 2 =-1.5, 25% rc, 2% lc b 1 =, b 2 =-4.5, 15% rc, 25% lc τ Bias AvgSD EmpSD Cov95 Bias AvgSD EmpSD Cov95.1 ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα Table 4: Simulation results on hypothesis testing and second-stage inference under log-linear models with heteroscedastic errors (n=2) Unconditional Case b 1 =, b 2 =-1.5, 25% right censoring, 2% left censoring H : β(τ) =, l τ u H : β(τ) = η, l τ u ERR AvgEst AvgSD EmpSD ERR ˆβ ˆβ ˆβ Conditional Case b 1 =, b 2 =-4.5, 15% right censoring, 25% left censoring H : α(τ) =, l τ u H : α(τ) = η, l τ u ERR AvgEst AvgSD EmpSD ERR ˆα ˆα ˆα

16 with the true parameter values. This figure shows that the proposed estimators are virtually unbiased while the naive estimators greatly deviate from the true values for non-zero parameters. Estimates for AFT Model intercept true parameters proposed estimator naive estimator b true parameters proposed estimator naive estimator b true parameters proposed estimator naive estimator tau tau tau Estimates for Heteroscedastic Model intercept true parameters proposed estimator naive estimator b true parameters proposed estimator naive estimator b true parameters proposed estimator naive estimator tau tau tau Figure 1: Comparing Estimates from Proposed Approach and Naive Approach 6 CFFPR Data Example In this section we apply our methods to the CFFPR data discussed in Section 1. Cystic Fibrosis (CF) is one of the most common and life-shortening genetic disorders affecting the lungs and digestive systems of about 3, children and adults in the United States and 7, worldwide (Cystic Fibrosis Foundation 21). Pseudomonas aeruginosa (PA), the predominant bacterial pathogen infecting 8% of CF patients under age 18, accelerates malfunction in lung and serves as an important predictor of mortality in CF (Retsch-Bogart et al. 28). In this paper we used the CFFPR data collected during to investigate the association between onset ages of the first detected PA infection and several risk factors in CF patients diagnosed by age 1. Similar data were analyzed by Lai et al. (24) under the Cox model and Yan et al. (29) based on temporal process regression. 15

17 In our analysis, the event time of interest T corresponds to patient s age at the first PA infection which is subject to both left and right censoring. Among 12,818 CF patients diagnosed between 1986 and 2, 3,343 (26.1%) patients had PA infection at study entry (i.e. left censored by patient s age at the first recorded visit L) and 2,213 (17.3%) patients had no PA infection documented by December 25 (i.e. right censored by age at the last follow-up before the cut-off date U). To avoid the complication with delayed entry (i.e. left truncation), we restricted the study population to subjects who were diagnosed before age 1 and alive at age 1. The first restriction was imposed because the first 1 years were known to have greatest potential to take advantage of early diagnosis (Campbell and White 25). The second restriction was imposed to avoid left truncation due to mortality prior to CF diagnosis. Since the mortality rate before age 1 was very low, about 1.5%, we expect excluding patients who died before age 1 would only result in a small deviation from the general CF population. The restricted sample contains 11,179 patients for which the left and right censoring rates are 23.7% and 16.2%, respectively. We applied the proposed quantile regression method to this doubly-censored restricted CFFPR sample. The same set of covariates as that in Yan et al. (29) were considered which included gender (1 for females and for males), diagnosis mode (denoted by factor ) and diagnosis year (denoted by dx ). Diagnosis mode was defined according to common clinical practices that identify CF, including four categories: diagnosis at birth due to meconium ileus (MI), diagnosis shortly after birth by neonatal/prenatal screening (SCR), diagnosis at variable ages because of family history (FH), and diagnosis at variable ages from various symptoms (SYMP) other than MI (Lai et al. 24, Yan et al. 29). Diagnosis year was classified into three cohorts that reflect medical progresses in CF diagnosis and treatment: ( dx86 ), ( dx9 ), and ( dx94 ). Without loss of generality, we chose male patients who were diagnosed between 1986 and 1989 by symptoms other than MI as our reference level. Figure 1 displays the coefficient estimates β(τ) in solid lines along with their 95% pointwise Wald confidence intervals in dotted lines for τ [l, u] with l=.1 and u=.65. The two endpoints.1 and.65 were chosen so that we would have enough data points to get reliable estimates at each quantile in between. Unsurprisingly the estimated intercept is an increasing step function which suggests that 1% of male patients with CF diagnosed between by SYMP acquired their first PA infection by age of 2 years and 65% of them had their first PA infection by age of 9 years. The estimated coefficients for MI are close to, the estimated coefficients for FH and SCR are always positive, while the coefficients for dx9 and dx94 are always negative. This may suggest that compared with those diagnosed with symptoms, patients diagnosed because of MI had similar onset ages 16

18 of PA infection, while patients diagnosed due to family history or screening developed their first PA infection at later ages. Regarding the diagnosis cohort effect, patients who were diagnosed in and exhibited earlier onset of PA infection than patients diagnosed in The coefficients for gender, though demonstrating some marginal significance around τ=.65, do not show an apparent trend of departing from. A formal significance test was performed based on the average covariate effects across quantiles ranging from.1 to.65. The aforementioned protecting effect of FH diagnosis and accelerating effects of newer diagnosis cohorts are significant on average, with p-values.2,.1 and <.1 (data not shown), respectively. The effect of SCR diagnosis is marginally significant (p-value=.7). We paid particular attention to testing the gender effect due to its cross-over pattern as shown in Figure 2. That is, the estimated coefficient is first positive and then flips the sign around τ=.35. Hence we assessed the aggregated gender effect in two τ- intervals:τ [.1,.35] and τ [.35,.65]. Our test is significant in neither interval, with both p-values around.2 (data not shown). For the three significant covariates, i.e., FH, dx9 and dx94, we also conducted a constancy test using the weight function Θ(t) = I[t < (l + u)/2] in the view of the rather monotone pattern with each coefficient estimate. Results show that only the effect of dx94 varies with τ, with p-values <.1 (data not shown). This result is consistent with our observation from Figure 2, that is, though always being negative, the magnitude of the coefficient for dx94 gradually decreases with τ. The scientific implication may be that patients diagnosed between 1994 and 2 were more likely to get more frequent culture, resulting in earlier detection of PA infection, as compared to those diagnosed between 1986 and However, the more frequent culture had less effect on later onset ages, hence such a difference was less dramatic for CF patients with late onset of PA infection. 7 Discussion In this paper we propose a quantile regression for doubly censored data. Quantile regression has been serving as a popular alternative for the classical Cox model and AFT models in survival analysis, but has not been adapted to doubly censored data to the best of our knowledge. By formulating the covariate effects on different quantiles of the outcome instead of the mean and adopting non-parametric approaches, our model benefits from flexibility and robustness. The proposed estimation procedure takes great advantage of the martingale structure that facilitates both asymptotic studies and implementations in standard statistical software. We make the rather common assumption of (L, U) C Z, where both L and 17

19 intercept female DX(MI) regression quantile regression quantile regression quantile tau tau tau DX(FH) DX(SCR) dx9 regression quantile regression quantile regression quantile tau tau tau dx94 regression quantile tau Figure 2: Estimated Regression Quantiles U are allowed to depend on Z. In addition, U is assumed to be unobservable as we often encounter in survival analysis. On the other hand, we do assume known L. To relax this assumption one needs to overcome additional difficulty that may precludes utilizing the martingale structure, nonetheless this could be one of our future directions. Identifiability is a subtle but important issue that is often encountered in censored data. Due to the loss of event time in both the lower and upper tails, doubly censored data may suffer from two-way non-identifiability. In our proposal we tackle this issue with a two-fold strategy. By restricting our attention to a range of τ away from 1, that is, (, τ U ] with τ U < 1, we bypass the identification on the upper tail, and by a conditional version of the regression model, in which we confine T > t to overcome non-identifiability on the lower tail. In this paper we assume fixed τ U and t, but in practice they can be chosen adaptively or based on scientific reasons. Although our methodology is tailored to the CFFPR data which we illustrate as an example, this work has potential broader applications. Double censoring may also arise from many other scenarios, for example, when measurements can only be accurately taken on a certain interval [L, U], or when the interest lies in the elapsed time between the occurrences of two events, both of which subject to right 18

20 censoring. Our work can be easily applied to these situations. In addition, the proposed conditional model share the essence of solution to data with left truncation and right censoring, and therefore we can also adapt our methods to this type of data without difficulty. 19

21 A A.1 Justification of the Estimating Equations A.1.1 The Unconditional Case To justify the estimating equation (4) for the unconditional case, we need to prove (3) first. Recall that dm i (t) = dn i (t) R i (t)λ i (t)dt and F t is the filtration σ{n i (u), R i (u + ), Z i : i = 1,, n; u t}. We have E[dN i (t) F t ] = Pr[dN i (t) = 1 F t ] = Pr[t T i < t + dt, R i (t) = 1 F t ] = R i (t)pr[t T i < t + dt R i (t), Z i ] = R i (t)pr[t T i < t + dt T i t, L i t, U i t, Z i ] = R i (t)λ i (t Z i )dt assuming (L i, U i ) T i Z i. Hence M i (t) is a martingale and E{M i (t) Z i } = for t. On the other hand, we have t R i (u)λ i (u Z i ) du = = t τ I(L i < u X i )λ i (u Z i ) du I(L i < F 1 T (v Z i) X i ) dh(v), where H(v) = log(1 v) for x < 1. Then it follows immediately that E[S n {β (τ)}] = for τ (, 1). A.1.2 The Conditional Case We want to show that E{ ˇM(t) Z} =, where ˇM(t) = Ň(t) t I(L t X) dλ T (u Z, T > t ), and Ň(t) = N(t) N(t ) for t > t and otherwise. From the arguments in the unconditional case we know that E{N(t) Z} = = Hence, for t > t, we have t t I(L < u X)λ(u Z) du I(u X)λ(u Z) du t = Λ T (t X Z) Λ T (t L Z). I(u L)λ(u Z) du (11) E{N(t) N(t ) Z} = {Λ T (t X Z) Λ T (t X Z)} {Λ T (t L Z) Λ T (t L Z)}. 2

22 On the other hand, we have (12) Λ T (t X Z) Λ T (t X Z) = I(X > t ){Λ T (t X Z) Λ T (t Z)} = I(X > t )[ log{1 Pr(T t X Z)} + log{1 Pr(t t Z)}] = I(X > t )[ log{1 Pr(T t X Z) Pr(T t Z) }] 1 Pr(T t Z) = I(X > t )[ log{1 Pr(T t X T > t, Z)}] = I(X > t )Λ T (t X T > t, Z). Similarly we can show that (13) Λ T (t L Z) Λ T (t L Z) = I(L > t )Λ T (t L T > t, Z). Plugging (12) and (13) into (11), we have (14) B E{N(t) N(t ) Z} =I(X > t )Λ T (t X T > t, Z) = = = I(L > t )Λ T (t L T > t, Z) t t τ {I(X > t )I(u X) I(L > t )I(u L)} λ T (u T > t, Z) du {L < u X}λ T (u T > t, Z) du I(L < F 1 T (v T > t, Z i ) X) dh(v). From (11) and (14) we immediately have E{ ˇM(t) Z} =. B.1 Regularity Conditions and Proofs B.1.1 Regularity Conditions for Theorems 1 and 2 Let F( Z) and F( Z) be the conditional CDF and survival function of X (the observed time) given Z, respectively; let F(t Z) = Pr(t < X t, δ = 1 Z); and let F X,L (t, t Z) = Pr(X t, L t Z). Moreover, let f ( Z), f ( Z), f ( Z) and f X,L (, Z) be the first order derivatives of F( Z), F( Z), F( Z) and F X,L (, Z), respectively. 21

23 Define: µ(b) = E[ZN{exp(Z b)}], B(b) = E[Z 2 f {exp(z b) Z} exp(z b)], v n (b) = n 1 Z i N i {exp(z i b)} µ(b), i=1 µ(b) = E[ZI{L < exp(z b) X}], J(b) = E[Z 2 ( f {exp(z b) Z} f X,L {exp(z b), exp(z b) Z}) exp(z b)], ṽ n (b) = n 1 Z i I{L i < exp(z i b) X i} µ(b). i=1 The regularity conditions are stated as follows: C1: The covariate space Z is compact, i.e., sup i Z i <. C2: (a) Each component of E(ZN[exp{Z β (τ)}]) is a Lipschitz function of τ, and (b) f (t Z) and f (t z) are bounded above uniformly in t and z. C3: (a) f {exp(z b) Z} > for all b B(d ), (b) E(Z 2 ) >, (c) each component of J(b)B(b) 1 is uniformly bounded in b B(d ), where B(d ) is a neighborhood containing {β (τ), τ (, τ U )}, defined as B(d ) = {b R p : inf τ (,τu ] µ(b) µ(β (τ)) d }. C4: inf τ [ν,τu ]eigminb(β (τ)) > for any ν (, τ U ), where eigmin( ) denotes the minimal eigenvalue of a matrix. B.1.2 Proof of Theorem 2 Following the proof of Lemma B.1. in Peng and Huang (28) we can show that (15) sup τ (,τu ] n 1/2 i=1 Z i {N i (exp{z i ˆβ(τ)}) N i (exp{z i β (τ)})} n 1/2 (µ{ˆβ(τ)} µ{β (τ)}) p. and sup τ (,τu ] n 1/2 i=1 Z i {I(L i < exp{z i ˆβ(τ)} X i ) I(L i < exp{z i β (τ)} X i )} (16) n 1/2 ( µ{ˆβ(τ)} µ{β (τ)}) p. 22

24 (15) and (16) together with the uniform convergence of µ{ˆβ(τ)} for τ (, τ U ] imply a stochastic differential equation as mentioned in Peng and Huang (28): n 1/2 S n (β, τ) =n 1/2 [µ{ˆβ(τ)} µ{β (τ)}] τ [J{β (u)}b{β (u)} 1 + o (,τu ](1)] n 1/2 [µ{ˆβ(τ)} µ{β (τ)}] dh(u) + o (,τu ](1). Using the production integration theory (Gill and Johansen 199; Andersen et al. 1998), we have (17) n 1/2 [µ{ˆβ(τ)} µ{β (τ)}] = φ{ n 1/2 S n (β, τ)} + o (,τu ](1), where φ is a linear operator. By the Donsker theorem, n 1/2 S n (β, τ) converges weakly to a tight Gaussian process G(τ) for τ (, τ U ]. Hence n 1/2 [µ{ˆβ(τ)} µ{β (τ)}] converges weakly to φ{g(τ)} which is also a Gaussian process. Using Taylor expansions we immediately have n 1/2 {ˆβ(τ) β (τ)} converges weakly to the Gaussian process B{β (τ)} 1 φ{g(τ)} for τ [ν, τ U ], where the lower limit ν ensures B{β (τ)} 1 is uniformly bounded. B.1.3 Regularity Conditions for Theorems 3 and 4 Define: µ(a) =E[ZÑ{exp(Z a) + t }], B(a) =E[Z 2 f {exp(z a + t ) Z} exp(z a)], v n (a) =n 1 Z i Ñ i {exp(z i a) + t } µ(a), i=1 µ(a) = E[ZI{L < exp(z a) + t X}], J(a) =E[Z 2 ( f {exp(z a) + t Z} f X,L {exp(z a) + t, exp(z a) + t Z}) exp(z a)], ṽ n (a) =n 1 Z i I{L i < exp(z i a) + t X i } µ(a). i=1 The regularity conditions are stated as follows: C1 : The covariate space Z is compact, i.e., sup i Z i <. C2 : (a) Each component of E(ZÑ(exp{Z α (τ)} + t )) is a Lipschitz function of τ, and (b) f (t z) and f (t z) are bounded above uniformly in t and z. C3 :(a) f {exp(z a) + t Z} > for all a B(d ), (b) E(Z 2 ) >, (c) each 23

25 component of J(a)B(a) 1 is uniformly bounded in a B(d ), where B(d ) is a neighborhood containing {α (τ), τ (, τ U )}, defined as B(d ) = {a R p : inf τ (,τu ] µ(a) µ(α (τ)) d }. C4 : inf τ [ν,τu ]eigminb(α (τ)) > for any ν (, τ U ), where eigmin( ) denotes the minimal eigenvalue of a matrix. C C.1 Simulation Results for n = 4 Table 5: Simulation results under AFT models (n=4) Unconditional Case Conditional Case b 1 =, b 2 =-.5, 2% rc, 2% lc b 1 =, b 2 =-1, 15% rc, 2% lc τ Bias AvgSD EmpSD Cov95 Bias AvgSD EmpSD Cov95.1 ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα

26 Table 6: Simulation results on hypothesis testing and second-stage inference under AFT models (n=4) Unconditional Case b 1 =, b 2 =-.5, 2% right censoring, 2% left censoring H : β(τ) =, l τ u H : β(τ) = η, l τ u ERR AvgEst AvgSD EmpSD ERR ˆβ ˆβ ˆβ Conditional Case b 1 =, b 2 =-1, 15% right censoring, 2% left censoring H : α(τ) =, l τ u H : α(τ) = η, l τ u ERR AvgEst AvgSD EmpSD ERR ˆα ˆα ˆα Table 7: Simulation results under log-linear models with heteroscedastic errors (n=4) Unconditional Case Conditional Case b 1 =, b 2 =-1.5, 25% rc, 2% lc b 1 =, b 2 =-4.5, 15% rc, 25% lc τ Bias AvgSD EmpSD Cov95 Bias AvgSD EmpSD Cov95.1 ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα ˆβ ˆα

27 Table 8: Simulation results on hypothesis testing and second-stage inference under log-linear models with heteroscedastic errors (n=4) Unconditional Case b 1 =, b 2 =-1.5, 25% right censoring, 2% left censoring H : β(τ) =, l τ u H : β(τ) = η, l τ u ERR AvgEst AvgSD EmpSD ERR ˆβ ˆβ ˆβ Conditional Case b 1 =, b 2 =-4.5, 15% right censoring, 25% left censoring H : α(τ) =, l τ u H : α(τ) = η, l τ u ERR AvgEst AvgSD EmpSD ERR ˆα ˆα ˆα

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased

More information

Quantile Regression for Recurrent Gap Time Data

Quantile Regression for Recurrent Gap Time Data Biometrics 000, 1 21 DOI: 000 000 0000 Quantile Regression for Recurrent Gap Time Data Xianghua Luo 1,, Chiung-Yu Huang 2, and Lan Wang 3 1 Division of Biostatistics, School of Public Health, University

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

STAT Sample Problem: General Asymptotic Results

STAT Sample Problem: General Asymptotic Results STAT331 1-Sample Problem: General Asymptotic Results In this unit we will consider the 1-sample problem and prove the consistency and asymptotic normality of the Nelson-Aalen estimator of the cumulative

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t) PhD course in Advanced survival analysis. (ABGK, sect. V.1.1) One-sample tests. Counting process N(t) Non-parametric hypothesis tests. Parametric models. Intensity process λ(t) = α(t)y (t) satisfying Aalen

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Attributable Risk Function in the Proportional Hazards Model

Attributable Risk Function in the Proportional Hazards Model UW Biostatistics Working Paper Series 5-31-2005 Attributable Risk Function in the Proportional Hazards Model Ying Qing Chen Fred Hutchinson Cancer Research Center, yqchen@u.washington.edu Chengcheng Hu

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Smoothed and Corrected Score Approach to Censored Quantile Regression With Measurement Errors

Smoothed and Corrected Score Approach to Censored Quantile Regression With Measurement Errors Smoothed and Corrected Score Approach to Censored Quantile Regression With Measurement Errors Yuanshan Wu, Yanyuan Ma, and Guosheng Yin Abstract Censored quantile regression is an important alternative

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Overview of today s class Kaplan-Meier Curve

More information

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. *

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. * Least Absolute Deviations Estimation for the Accelerated Failure Time Model Jian Huang 1,2, Shuangge Ma 3, and Huiliang Xie 1 1 Department of Statistics and Actuarial Science, and 2 Program in Public Health

More information

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes Biometrics 000, 000 000 DOI: 000 000 0000 Web-based Supplementary Materials for A Robust Method for Estimating Optimal Treatment Regimes Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian

More information

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR J. Japan Statist. Soc. Vol. 3 No. 200 39 5 CALCULAION MEHOD FOR NONLINEAR DYNAMIC LEAS-ABSOLUE DEVIAIONS ESIMAOR Kohtaro Hitomi * and Masato Kagihara ** In a nonlinear dynamic model, the consistency and

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner

More information

Empirical Likelihood in Survival Analysis

Empirical Likelihood in Survival Analysis Empirical Likelihood in Survival Analysis Gang Li 1, Runze Li 2, and Mai Zhou 3 1 Department of Biostatistics, University of California, Los Angeles, CA 90095 vli@ucla.edu 2 Department of Statistics, The

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

Published online: 10 Apr 2012.

Published online: 10 Apr 2012. This article was downloaded by: Columbia University] On: 23 March 215, At: 12:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

TMA 4275 Lifetime Analysis June 2004 Solution

TMA 4275 Lifetime Analysis June 2004 Solution TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

More information

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0, Accelerated failure time model: log T = β T Z + ɛ β estimation: solve where S n ( β) = n i=1 { Zi Z(u; β) } dn i (ue βzi ) = 0, Z(u; β) = j Z j Y j (ue βz j) j Y j (ue βz j) How do we show the asymptotics

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Zhengyu Zhang School of Economics Shanghai University of Finance and Economics zy.zhang@mail.shufe.edu.cn

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis

Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis Michael P. Babington and Javier Cano-Urbina August 31, 2018 Abstract Duration data obtained from a given stock of individuals

More information

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes Introduction Method Theoretical Results Simulation Studies Application Conclusions Introduction Introduction For survival data,

More information

Censoring mechanisms

Censoring mechanisms Censoring mechanisms Patrick Breheny September 3 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Fixed vs. random censoring In the previous lecture, we derived the contribution to the likelihood

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Censored quantile regression with varying coefficients

Censored quantile regression with varying coefficients Title Censored quantile regression with varying coefficients Author(s) Yin, G; Zeng, D; Li, H Citation Statistica Sinica, 2013, p. 1-24 Issued Date 2013 URL http://hdl.handle.net/10722/189454 Rights Statistica

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Quantile Regression for Extraordinarily Large Data

Quantile Regression for Extraordinarily Large Data Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step

More information

FAILURE-TIME WITH DELAYED ONSET

FAILURE-TIME WITH DELAYED ONSET REVSTAT Statistical Journal Volume 13 Number 3 November 2015 227 231 FAILURE-TIME WITH DELAYED ONSET Authors: Man Yu Wong Department of Mathematics Hong Kong University of Science and Technology Hong Kong

More information

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Right censored

More information

Rank Regression Analysis of Multivariate Failure Time Data Based on Marginal Linear Models

Rank Regression Analysis of Multivariate Failure Time Data Based on Marginal Linear Models doi: 10.1111/j.1467-9469.2005.00487.x Published by Blacwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 33: 1 23, 2006 Ran Regression Analysis

More information

For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood,

For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood, A NOTE ON LAPLACE REGRESSION WITH CENSORED DATA ROGER KOENKER Abstract. The Laplace likelihood method for estimating linear conditional quantile functions with right censored data proposed by Bottai and

More information

Empirical Processes & Survival Analysis. The Functional Delta Method

Empirical Processes & Survival Analysis. The Functional Delta Method STAT/BMI 741 University of Wisconsin-Madison Empirical Processes & Survival Analysis Lecture 3 The Functional Delta Method Lu Mao lmao@biostat.wisc.edu 3-1 Objectives By the end of this lecture, you will

More information

Comparing Distribution Functions via Empirical Likelihood

Comparing Distribution Functions via Empirical Likelihood Georgia State University ScholarWorks @ Georgia State University Mathematics and Statistics Faculty Publications Department of Mathematics and Statistics 25 Comparing Distribution Functions via Empirical

More information

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin FITTING COX'S PROPORTIONAL HAZARDS MODEL USING GROUPED SURVIVAL DATA Ian W. McKeague and Mei-Jie Zhang Florida State University and Medical College of Wisconsin Cox's proportional hazard model is often

More information

ST745: Survival Analysis: Nonparametric methods

ST745: Survival Analysis: Nonparametric methods ST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University February 5, 2015 The KM estimator is used ubiquitously in medical studies to estimate

More information

The Accelerated Failure Time Model Under Biased. Sampling

The Accelerated Failure Time Model Under Biased. Sampling The Accelerated Failure Time Model Under Biased Sampling Micha Mandel and Ya akov Ritov Department of Statistics, The Hebrew University of Jerusalem, Israel July 13, 2009 Abstract Chen (2009, Biometrics)

More information

EMPIRICAL LIKELIHOOD ANALYSIS FOR THE HETEROSCEDASTIC ACCELERATED FAILURE TIME MODEL

EMPIRICAL LIKELIHOOD ANALYSIS FOR THE HETEROSCEDASTIC ACCELERATED FAILURE TIME MODEL Statistica Sinica 22 (2012), 295-316 doi:http://dx.doi.org/10.5705/ss.2010.190 EMPIRICAL LIKELIHOOD ANALYSIS FOR THE HETEROSCEDASTIC ACCELERATED FAILURE TIME MODEL Mai Zhou 1, Mi-Ok Kim 2, and Arne C.

More information

Power-Transformed Linear Quantile Regression With Censored Data

Power-Transformed Linear Quantile Regression With Censored Data Power-Transformed Linear Quantile Regression With Censored Data Guosheng YIN, Donglin ZENG, and Hui LI We propose a class of power-transformed linear quantile regression models for survival data subject

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Regression Calibration in Semiparametric Accelerated Failure Time Models

Regression Calibration in Semiparametric Accelerated Failure Time Models Biometrics 66, 405 414 June 2010 DOI: 10.1111/j.1541-0420.2009.01295.x Regression Calibration in Semiparametric Accelerated Failure Time Models Menggang Yu 1, and Bin Nan 2 1 Department of Medicine, Division

More information

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Distributions, Hazard Functions, Cumulative Hazards BIO 244: Unit 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Constrained estimation for binary and survival data

Constrained estimation for binary and survival data Constrained estimation for binary and survival data Jeremy M. G. Taylor Yong Seok Park John D. Kalbfleisch Biostatistics, University of Michigan May, 2010 () Constrained estimation May, 2010 1 / 43 Outline

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

and Comparison with NPMLE

and Comparison with NPMLE NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA and Comparison with NPMLE Mai Zhou Department of Statistics, University of Kentucky, Lexington, KY 40506 USA http://ms.uky.edu/

More information

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Dragi Anevski Mathematical Sciences und University November 25, 21 1 Asymptotic distributions for statistical

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

Product-limit estimators of the survival function with left or right censored data

Product-limit estimators of the survival function with left or right censored data Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut

More information

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula

More information

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

More information

Multiscale Adaptive Inference on Conditional Moment Inequalities

Multiscale Adaptive Inference on Conditional Moment Inequalities Multiscale Adaptive Inference on Conditional Moment Inequalities Timothy B. Armstrong 1 Hock Peng Chan 2 1 Yale University 2 National University of Singapore June 2013 Conditional moment inequality models

More information

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models.

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models. Some New Methods for Latent Variable Models and Survival Analysis Marie Davidian Department of Statistics North Carolina State University 1. Introduction Outline 3. Empirically checking latent-model robustness

More information

POWER AND SAMPLE SIZE DETERMINATIONS IN DYNAMIC RISK PREDICTION. by Zhaowen Sun M.S., University of Pittsburgh, 2012

POWER AND SAMPLE SIZE DETERMINATIONS IN DYNAMIC RISK PREDICTION. by Zhaowen Sun M.S., University of Pittsburgh, 2012 POWER AND SAMPLE SIZE DETERMINATIONS IN DYNAMIC RISK PREDICTION by Zhaowen Sun M.S., University of Pittsburgh, 2012 B.S.N., Wuhan University, China, 2010 Submitted to the Graduate Faculty of the Graduate

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1. By Thomas H. Scheike University of Copenhagen

A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1. By Thomas H. Scheike University of Copenhagen The Annals of Statistics 21, Vol. 29, No. 5, 1344 136 A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1 By Thomas H. Scheike University of Copenhagen We present a non-parametric survival model

More information

Linear rank statistics

Linear rank statistics Linear rank statistics Comparison of two groups. Consider the failure time T ij of j-th subject in the i-th group for i = 1 or ; the first group is often called control, and the second treatment. Let n

More information

Exercises. (a) Prove that m(t) =

Exercises. (a) Prove that m(t) = Exercises 1. Lack of memory. Verify that the exponential distribution has the lack of memory property, that is, if T is exponentially distributed with parameter λ > then so is T t given that T > t for

More information

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science Log-linearity for Cox s regression model Thesis for the Degree Master of Science Zaki Amini Master s Thesis, Spring 2015 i Abstract Cox s regression model is one of the most applied methods in medical

More information

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data 1 Part III. Hypothesis Testing III.1. Log-rank Test for Right-censored Failure Time Data Consider a survival study consisting of n independent subjects from p different populations with survival functions

More information

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks Y. Xu, D. Scharfstein, P. Mueller, M. Daniels Johns Hopkins, Johns Hopkins, UT-Austin, UF JSM 2018, Vancouver 1 What are semi-competing

More information

Chapter 4. Parametric Approach. 4.1 Introduction

Chapter 4. Parametric Approach. 4.1 Introduction Chapter 4 Parametric Approach 4.1 Introduction The missing data problem is already a classical problem that has not been yet solved satisfactorily. This problem includes those situations where the dependent

More information

STAT 331. Martingale Central Limit Theorem and Related Results

STAT 331. Martingale Central Limit Theorem and Related Results STAT 331 Martingale Central Limit Theorem and Related Results In this unit we discuss a version of the martingale central limit theorem, which states that under certain conditions, a sum of orthogonal

More information

Likelihood ratio confidence bands in nonparametric regression with censored data

Likelihood ratio confidence bands in nonparametric regression with censored data Likelihood ratio confidence bands in nonparametric regression with censored data Gang Li University of California at Los Angeles Department of Biostatistics Ingrid Van Keilegom Eindhoven University of

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Survival Analysis. Lu Tian and Richard Olshen Stanford University 1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival

More information