University of California, Berkeley

Similar documents
Chapter 2 Inference on Mean Residual Life-Overview

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Linear life expectancy regression with censored data

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Comparing Distribution Functions via Empirical Likelihood

Likelihood ratio confidence bands in nonparametric regression with censored data

11 Survival Analysis and Empirical Likelihood

Attributable Risk Function in the Proportional Hazards Model

Adjusted Empirical Likelihood for Long-memory Time Series Models

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Quantile Regression for Residual Life and Empirical Likelihood

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

TMA 4275 Lifetime Analysis June 2004 Solution

Efficiency of Profile/Partial Likelihood in the Cox Model

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A note on profile likelihood for exponential tilt mixture models

Exercises. (a) Prove that m(t) =

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

Empirical Likelihood in Survival Analysis

Survival Analysis for Case-Cohort Studies

Empirical likelihood for average derivatives of hazard regression functions

STAT331. Cox s Proportional Hazards Model

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

Product-limit estimators of the survival function with left or right censored data

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

Power and Sample Size Calculations with the Additive Hazards Model

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Lecture 2: Martingale theory for univariate survival analysis

AFT Models and Empirical Likelihood

Jackknife Empirical Likelihood for the Variance in the Linear Regression Model

ICSA Applied Statistics Symposium 1. Balanced adjusted empirical likelihood

Survival Analysis Math 434 Fall 2011

Full likelihood inferences in the Cox model: an empirical likelihood approach

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Effects of data dimension on empirical likelihood

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Cox s proportional hazards model and Cox s partial likelihood

Analysis of transformation models with censored data

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

University of California, Berkeley

Lecture 22 Survival Analysis: An Introduction

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

and Comparison with NPMLE

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky

BARTLETT IDENTITIES AND LARGE DEVIATIONS IN LIKELIHOOD THEORY 1. By Per Aslak Mykland University of Chicago

TESTINGGOODNESSOFFITINTHECOX AALEN MODEL

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA. Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky

A note on L convergence of Neumann series approximation in missing data problems

Semiparametric Regression

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

On robust and efficient estimation of the center of. Symmetry.

Interval Estimation for AR(1) and GARCH(1,1) Models

MAS3301 / MAS8311 Biostatistics Part II: Survival

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood

Tied survival times; estimation of survival probabilities

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

Frailty Models and Copulas: Similarities and Differences

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

ST745: Survival Analysis: Cox-PH!

On the Breslow estimator

A class of mean residual life regression models with censored survival data

LIKELIHOOD RATIO INFERENCE FOR MISSING DATA MODELS

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities

STAT Sample Problem: General Asymptotic Results

Chapter 17. Failure-Time Regression Analysis. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

MAXIMUM LIKELIHOOD METHOD FOR LINEAR TRANSFORMATION MODELS WITH COHORT SAMPLING DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Constrained estimation for binary and survival data

Extending the two-sample empirical likelihood method

Treatment Comparison in Biomedical Studies Using Survival Function

Bayesian Nonparametric Inference Methods for Mean Residual Life Functions

The International Journal of Biostatistics

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

Proportional hazards regression

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Accelerated Failure Time Models: A Review

Goodness-of-fit test for the Cox Proportional Hazard Model

Multistate Modeling and Applications

Issues on quantile autoregression

Tests of independence for censored bivariate failure time data

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models

ST745: Survival Analysis: Nonparametric methods

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1. By Thomas H. Scheike University of Copenhagen

Transcription:

University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan Zhao Division of Biostatistics, School of Public Health, University of California, Berkeley, yqchen@stat.berkeley.edu Department of Mathematics and Statistics, Georgia State University, Atlanta, Georgia, dz27@gmail.com This working paper is hosted by The Berkeley Electronic Press (bepress) and may not be commercially reproduced without the permission of the copyright holder. http://biostats.bepress.com/ucbbiostat/paper153 Copyright c 24 by the authors.

A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen and Yichuan Zhao Abstract Mean residual life function, or life expectancy, is an important function to characterize distribution of residual life. The proportional mean residual life model by Oakes and Dasu (199) is a regression tool to study the association between life expectancy and its associated covariates. Although semiparametric inference procedures have been proposed in the literature, the accuracy of such procedures may be low when the censoring proportion is relatively large. In this paper, the semiparametric inference procedures are studied with an empirical likelihood ratio method. An empirical likelihood confidence region is constructed for the regression parameters. The proposed method is further compared with the normal approximation based method through a simulation study.

1 Introduction A mean residual life function, m(t), t, is the expected remaining life given survival to t. Suppose T is a failure time, then m(t) =E(T t T>t). It is an important function in economics, actuarial sciences, reliability and survival analysis to characterize life expectancy. Oakes and Dasu (199) proposed a class of semiparametric models called the proportional mean residual life model, as an alternative to the widely used Cox proportional hazards model, to study the association between m(t) and its associated covariates. The Oakes-Dasu model directly models the distribution of residual life and carries appealing interpretation in life expectancy. Specifically, an Oakes-Dasu proportional mean residual life model usually assumes that m{t Z(t)} = m (t) exp{β T Z(t)}, (1) where m( ) are mean residual life functions, Z( ) are p vector covariates and β are associated parameters. In a semiparametric version of this model, m ( ) is usually unspecified. When a model satisfies both the proportional hazards and the proportional mean residual life assumptions, its underlying distributions belong to the Hall-Wellner class of distributions with linear mean residual life functions (Oakes & Dasu, 199). When there is no censoring, estimation procedures were developed in Maguluri and Zhang (1994). When censoring presents, Chen and Cheng (24) recently developed quasi-partial score estimating equations for the regression parameters. Nevertheless, these large-sample normal approximation based estimation methods tend to have poor performance when the sample size is relatively small or the censoring proportion is relatively large. In this short note, we instead consider empirical likelihood method to estimate the parameters in model (1), as it is a powerful nonparametric method. In general, the empirical likelihood method has unique features, such as range respecting, transformation-preserving, asymmetric confidence interval, Bartlett correctability, and better coverage probability for small sample (Owen, 21). In analysis of censored survival times, for example, empirical likelihood was used to derived pointwise confidence intervals for survival function with right censored data as early as in 1975 (Thomas & Grunkemeier, 1975). 2 Hosted by The Berkeley Electronic Press

In this short note we use the simple estimating equations in Chen and Cheng (1994) to construct an empirical likelihood-ratio based confidence region. The proposed confidence region and main asymptotic result are in Section 2. Results from a moderate simulation are in Section 3 to compare our method with normal approximation based method. Some alternative estimation method is discussed in Section 4. 2 Main results In addition to the failure time T, let C be the potential censoring time. Conditional on the p vector covariate Z, T and C are assumed to be independent. Suppose that the observed data set consists of n independent copies of (X i, i,z i ), i =1,...,n, where X i = min(t i,c i ) and i = I(T i C i ). Here, I( ) is indicator function. Denote Y i (t) =I(X i t) and N i (t) =I(X i t) i. Let <τ= inf{t : pr(x >t)=} <. As derived in (24), the following estimating equations can be used to estimate the parameter β in model (1): U(β) = {Z i Z(t)}{ m (t; β)dn i (t) Y i (t) exp( β T Z i )dt} =, (2) where Z(t) = n Y i(t)z i / n Y i(t) and m (t; β) = [ exp { t i dn i(t) i Y i(t) ) }] 1 { u exp i dn } i(s) i Y i(u) exp( β T Z i ) t i Y i(s) i Y du. i(u) Denote β and β the estimated and true parameter of β, respectively. Then as shown in Chen and Cheng (24), under the regularity conditions, the random vector n 1/2 ( β β ) D N(,A 1 VA 1 ), where A and V can be consistently estimated by their empirical estimators, Â = 1 n V = 1 n {Z i (t) Z(t)} 2 Y i (t) exp( β T Z i )dt, and {Z i (t) Z(t)} 2 Y i (t) m (t; β){exp( β T Z i )dt + d m (t; β)}, 3 http://biostats.bepress.com/ucbbiostat/paper153

respectively. Thus an asymptotic 1(1 α)% confidence region for β based on the above normal approximation is given by R 1 = {β : n( β β ) T Â V 1 Â( β β ) χ 2 p(α)}, (3) where χ 2 p(α) is the upper α-quantile of the chi-square distribution with degrees of freedom p. Apparently the accuracy of R 1 mainly depends on the large-sample normal approximation and also the proportion of censoring. proportions, its accuracy may be compromised. For relatively small sample size or large censoring Now consider the empirical likelihood approach, instead. For i = 1, 2,...,n, we define W i = {Z i Z(t)}{ m (t; β )dn i (t) Y i (t) exp( β T Z i )dt}, and summarize the following results in Chen and Cheng (24) as a lemma. Lemma 1. Under regularity conditions in Chen and Cheng (24), (i) n 1/2 n W i N(,V), and (ii) n 1 n W i W i T V in probability. D Thus the associated empirical likelihood is { n L(β ) = sup p i : p i =1, } p i W i =,p i,,...,n. Let p =(p 1,...,p n ) T be a vector of probabilities such that n p i = 1, where p i, i =1, 2,...,n. Since n p i attains its maximum at p i =1/n, the empirical likelihood ratio at the true value β is then { n R(β ) = sup np i : } p i =1, p i W i =,p i,,...,n. By using Lagrange multipliers, we have where λ satisfies the equation 2 log R(β )=2 1 n log { }, (4) W i =. (5) 4 Hosted by The Berkeley Electronic Press

Suppose that {Z i (t)} are uniformly bounded by a constant. Now define V i = {Z i µ Z (t)} m (t)dm i (t), where µ Z (t) is the limit of Z(t) asn. Then E V i 2 <. According to the proof of Lemma 3 in Owen (199), we have max 1 i n V i = o p (n 1/2 ). Note W i = { Zi Z(t) } m (t)dm i (t)+o p (1). By the martingale representations of V i and W i, we can prove that V i W i = o p (1). Then we have max i = o p (n 1/2 ), and 1 i n (6) 1 W i 3 = o p (n 1/2 ). n (7) Let λ = ρθ, where ρ and θ = 1. Recall Γ n =1/n n W iwi T = V + o p (1), where V is the limit of 1/n n W iwi T. Let σ p > be the smallest eigenvalue of V. Then, θγ n θ σ p +o p (1). According to Lemma 1, 1/n n W i = O p (n 1/2 ). By (6), the equations in (5) and the argument used in Owen (199), we know that λ = O p (n 1/2 ). (8) Consider a Taylor expansion to the right-hand side of (4), 2 log R(β )=2 {λ T W i 12 } (λt W i ) 2 + r n, (9) where r n = O p (1) n λt W i 3. Hence, by (7), r n = O p (1) λ 3 n W i 3 = o p (1). Furthermore, since 1 n it follows that W i = 1 n = 1 n λ = ( ) W W i 1 λ T W i + (λt i ) 2 ( ) 1 W i W i Wi T λ + 1 n n ( W i W T i W i (λ T W i ) 2 =, ) 1 W i + o p (1). (1) 5 http://biostats.bepress.com/ucbbiostat/paper153

Similarly, we have Since λ T W i = (λ T W i ) (λ T W i ) 2 + (λ T W i ) 3 =. (11) (λ T W i ) 3 = o p (1), (12) we know that n (λt W i ) 2 = n n j=1 λt W i + o p (1). As a result, the following is true 2 log R(β )= λ T W i + o p (1) = ( = χ 2 p. n 1/2 ) T ( W i Hence we establish a theorem as: n 1 W i W T i ) 1 ( n 1/2 ) W i + o p (1) Theorem 1. Assume {Z i (t)} are uniformly bounded by a constant. Then 2 log R(β ) converges in distribution to χ 2 p, where χ 2 p is a chi-square distribution with degrees of freedom p. According to this theorem, an asymptotic 1(1 α)% empirical likelihood confidence region for β is thus constructed as where χ 2 p(α) is defined before. R 2 = {β : 2 log R χ 2 p(α)}, (13) 3 Simulations A small-scale simulation is conducted to compare the performance of the empirical likelihood procedure with the normal approximation procedures. In Chen and Cheng (24), their simulations were conducted for relatively large sample size with relatively small proportion of censoring. In order to compare the results, we adopt a similar simulation setup as theirs. That is, we consider two covariates for each subject, Z 1 and Z 2, respectively, with Z 1 being 6 Hosted by The Berkeley Electronic Press

Table 1: Summary of simulation studies: 95% nominal coverage probabilities of normal approximation and empirical likelihood method β =(, ) T β =(1, 1) T n Censoring % Method Z 1 Z 2 Z 1 Z 2 5 25% Normal.873.882.865.852 5 25% EL.932.937.94.935 5 5% Normal.797.84.811.817 5 5% EL.917.922.934.925 2 25% Normal.952.944.942.958 2 25% EL.957.955.956.947 2 5% Normal.957.942.938.947 2 5% EL.944.952.949.933 Normal, normal approximation method; EL, empirical likelihood method. binary of and 1 and Z 2 being uniform on [,1]. The baseline mean residual life function is t + 1, corresponding to a Pareto distribution with survival function of 1/(t +1) 2. Failure times are generated according to model (1), with true parameters of β to be (, ) T and (1, 1) T, respectively. Independent censoring times are generated from uniform on [o, c], with different c selected to result in 25% and 5% of censoring, respectively. The sample size for each simulation is 5, and 2, representing relatively small and large samples, respectively. The simulation results are tabulated in Table (1). Each entry of the table is based on 1, simulated data sets. As shown in the table, both of the methods work reasonably well with right coverage probabilities of 95% when sample size is relatively large. But for sample size, the normal approximation method apparently has relatively larger under-coverage, while the empirical likelihood method has better coverage. 4 Discussion In this short note, we use an empirical likelihood method to construct confidence regions for the parameters in the proportional mean residual life model. This method is shown to 7 http://biostats.bepress.com/ucbbiostat/paper153

be relatively more accurate in coverage probabilities in small sample size, compared with the normal approximation method in Chen and Cheng (24). As seen in the development, the empirical likelihood method was applied to the estimating equations proposed by Chen and Cheng (24). In fact, an empirical likelihood estimator for β can be obtained as β = argmax β {R(β)}, which can be further shown n 1/2 ( β β ) D N(,A 1 VA 1 ). In addition, empirical likelihood method can be extended to the weighted version of estimating equations straightforwardly as well. The efficient estimator of β would be obtained by choosing the optimal weight function. In fact, the empirical likelihood method can be used to construct confidence regions with an alternative approach, although it involves estimation of censoring distribution. Consider a synthetic variable T (G, t) =S c (t)x /S c (X), for t>, where S c ( ) is the survival function of censoring distribution. Then [ { } ] E{ T Sc (t)ti(c T ) (t; S c ) X>t; Z} = E T E C S c (T ) C>t; Z T>t; Z [{ } Sc (t)ts c (T C>t) = E T T>t; Z] = E(T T>t; Z). S c (T ) Thus, the following estimating equations can be used to estimate m (t) and β jointly: { } Y i (t) T i (t; Ŝ c ) m (t) exp(β T Z i ) =, (14) { } Y i (t)z i Ti (t; Ŝc) m (t) exp(β T Z i ) dt =, (15) where Ŝ c is some consistent estimator of S c, such as the Kaplan-Meier estimator when the censoring is considered as homogeneous. By plugging in (15) with (14), thus the following estimating equations can be used to estimate β: Y i (t) T i (t; Ŝ c ) { Z i Z(t) } dt =. Thus similar empirical likelihood method proposed previously should apply to construct alternative confidence regions. When censoring is heterogenous across individual subjects, more model assumptions are then needed to use this approach. 8 Hosted by The Berkeley Electronic Press

References Chen, Y. Q. & Cheng, S. (24), Semiparametric regression analysis of mean residual life with censored survival data, Biometrika, in press. Cox, D. R. (1972), Regression models and life-tables (with discussion), J. R. Statist. Soc. B, 34, 187 22. Maguluri, G. & Zhang, C. H. (1994), Estimation in the mean residual life regression model, J. Roy. Statist. Soc. Ser. B, 56, 477 489. Oakes, D. & Dasu, T. (199), A note on residual life, Biometrika 77, 49 41. Owen, A. (199), Empirical likelihood and confidence regions, Ann. Statist. 18, 9 12. Owen, A. (21), Empirical Likelihood, New York: Chapman & Hall/CRC. Thomas, D. R. & Grunkemeier, G. L. (1975), Confidence interval estimation for survival probabilities for censored data, J. Amer. Statist. Assoc. 7, 865 871. 9 http://biostats.bepress.com/ucbbiostat/paper153