Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia Vonta, Univ. of Cyprus. The talk is based on joint papers: (i) about NPMLE Scan. Jour. Stat. to appear 2003. (ii) about Modified Profile Likelihood, JSPI 2004? TALK OUTLINE I. MOTIVATION Transformation Model Problems II. APPROACH Finite-Dimensional Version III. BACKGROUND LITERATURE Profile Likelihood, Frailty & Accelerated Failure Models IV. FRAILTY CASE (info bounds) V. ACCELERATED-FAILURE CASE (more details) 1

Transformation Survival Models Variables: T i Survival times, Discrete Covariates Z i C i Censoring Times, cond surv fcn R z (c) given Z i = z DATA: iid triples (min(t i, C i ), I [ T i C i ], Z i ) Observable processes: N i z(t) = I [Zi =z, T i min(c i,t)], Y i z (t) = I [min(ti, C i ) t] TRANSF. MODEL: S T Z (t z) = exp( G(e β z Λ(t))) G known, β R m, Λ cumulative-hazard fcn PROBLEM: efficient estimation of β. Special Cases: (1) Cox 1972: G(x) x (2) Frailty: unobserved random intercept β 0 = ξ i, G x = G(x) = log 0 e sx df (s) (3) Clayton-Cuzick 1986: G(x) 1 b log(1 + bx) 2

Cox-Model Case S T Z (t z) = exp( e β z Λ(t)),, G(x) = x h T Z (t z) = e β z λ(t) is the Proportional or Multiplicative Hazards model. Frailty If β Z covariate has added to it an unobservable random-effect intercept log ξ called frailty, P (T > t Z = z) = E ξ ( exp( ξ e β z Λ(t)) exp( G(e β z Λ(t))) In famous Clayton-Cuzick (1986) frailty model example, take ξ Gamma(b 1, b 1 ), leading to S T Z (t z) = (1+be β z Λ(t)) 1/b, h T Z (t z) = e β z λ(t) 1 + be β z Λ(t) Why are these Transformation Models? (Recall cum. haz. at failure is Expon(1) variate V.) G( e β Z Λ(T ) ) = V or for G known, but β, Λ unknown, log Λ(T ) = log G 1 (V ) β Z 3

Accelerated-Failure Models These are also Transformation Models. Covariates now have an additive effect on transformed time-variable: T 0 is neutral individual s failure-time g a (known) measurement scale survival fcn of g(t 0 ) unknown and equal to K(e t ). Now suppose g(t ) = β Z + g(t 0 ) Then S T Z (t z) = P (g(t ) > g(t) Z = z) = P (g(t ) > g(t) + β z) = K(e β z+g(t) ) has transformation-model form, for K unknown, g known (often equal to log). Note: this is the same as the transformation model log Λ(T ) = log G 1 (V ) β Z for frailty if Λ were known and G unknown! 4

Finite-dimensional Case X i, i = 1,..., n iid f(x, β, λ), β R m, λ R d unknown, with true values (β 0, λ 0 ) loglik(β, λ) = n log f(x i, β, λ) Profile Likelihood = loglik(β, ˆλ β ) with restricted MLE ˆλβ = arg max λ loglik(β, λ) Min Kullback-Leibler Modified Profile Approach (Severini and Wong 1992) K(β, λ) E β0,λ 0 ( log f(x 1, β, λ)) = {log f(x, β, λ)} f(x, β 0, λ 0 ) dx Define: λ β = arg max λ K(β, λ) Then: λβ estimates curve λ β Candidate Estimator β arg max β loglik(β, λ β ) 5

Information Matrix : I(β, λ) = = 2 A β,λ Bβ,λ β log f(x, β, λ) 2 βλ log f(x, β, λ) 2 λβ log f(x, β, λ) 2 λ log f(x, β, λ) B β,λ C β,λ f 0 (x)dx Note that by definition of K and implicit (total) differentiation T β [ λ K(β, λ β ))] = B + C β λ β = 0 The usual Information about β for this model, defined (as in the Cramer-Rao Inequality) as inverse of the minimum variance matrix for unbiased estimators of β, is A β0,λ 0 B β 0,λ 0 C 1 β 0,λ 0 B β0,λ 0 Equivalently, to test β = β 0, denoting restricted MLE ˆλ r as maximizer of loglik(β 0, λ), efficient test-statistic is 1 [ β loglik(β 0, ˆλ r ) B n β 0,ˆλ r (C β 0,ˆλ r ) 1 λ loglik(β 0, ˆλ r ) ] Neyman (1959) indicated that the same efficiency for test-statistic can be obtained much more generally, with ˆλ r replaced by preliminary estimator consistent for λ 0 at rate o P (n 1/4 ). 6

Key mathematical features of the modified profile approach via β, λ β are: the technical convenience of restricting attention to nuisance parameters such as hazards or density functions which satisfy smoothness restrictions; replacement of operator-inversion within (blocks of) the generalized information operator by differentiation of the Kullback-Leibler minimizer, since β λ β = C 1 B and the semiparametric Information about β is J (β 0, λ 0 ) = A β0,λ 0 ( β λ β0 ) C β0,λ 0 ( β λ β0 ) and there is no need for high-order consistency of estimation of λ, when consistent estimators of K-L minimizers λ β and their derivatives with respect to structural parameters are available. Whether dimension of nuisance parameter is finite or infinite, under regularity conditions: n ( β β0 ) D N (0, (J (β 0, λ 0 )) 1 ) 7

Semiparametric Case (λ -dim) Kullback-Leibler Functional K(β, λ) = ( log f(x, β, λ)) f(x, β 0, λ 0 ) dx Define λ β = arg max λ K(β, λ) to satisfy: λ K(β, λ β ) = 0 Under minimal regularity conditions: (β, λ β ) is a least-favorable nuisance-parameterization Substitute preliminary estimators β0, λ 0 (usually involves density-estimator for λ 0 ), into λ β formula to get estimator λ β. Then maximize over β within loglik(β, λ β ) = n log f X (X i, β, λ β ) 8

LITERATURE Profile Likelihood Cox, D. R. & Reid, N. (1987) JRSSB McCullagh, P. & Tibshirani, R. (1990) JRSSB Severini, T. and Wong, W. (1992) Ann. Stat. Semiparametrics Bickel, Klaassen, Ritov, & Wellner: 1993 Book Cox, D. R. (1972) Cox-Model paper JRSSB Murphy & van der Vaart (2000) JASA Transformation/Frailty Models Cheng, Wei, & Ying (1995) Biometrika Clayton & Cuzick (1986) ISI Centenary Session Slud, E. & Vonta, I. (2003) Scand. Jour. Stat Parner (1998) Ann. Stat. Accelerated Failure Time Models Koul, Susarla & van Ryzin (1981) Ann. Stat. Tsiatis (1990) Ann. Stat. Ritov (1990) Ann. Stat. 9

-dim Examples (I). Cox model. For q z (t) = p Z (z)r z (t) exp( e z β 0 Λ 0 (t)), can solve uniquely for: Λ β (t) t 0 λ β(s) ds = z q z (x) e z β 0 λ 0 (x) z e z β q z (x) Let λ 0 be a consistent density estimate of λ 0 (x) (eg by smoothing and differentiating the Nelson-Aalen estimator on data in a z = 0 data-stratum.) Estimate q z (t) by kernel-smoothing the at-risk process Y z (t)/n, λ β (t) = n e Z i β 0 A( t T i ) b λ β0 (T i ) / n n e Z i β 0 A( t T i b n ) where A is a smooth cdf (kernel) and b n a bandwidth parameter decreasing slowly to 0 as n. NB. In this example, any β estimator produced in this way collapses to the usual Cox Max Partial Likelihood Estimator! 10

(II). Transformation/Frailty Models In the general G transformation model case, must assume for some finite time τ 0 with Λ 0 (τ 0 ) < that all data are censored at τ 0. In this model, Slud & Vonta (2003) characterize the K-optimizing hazard intensity λ β in its integrated form L = Λ β = 0 λ β (x) dx, through the second order ODE system: dl dλ 0 (s) = z e z β 0 q z (s) G (e z β 0 Λ 0 (s)) z e z β q z (s) G (e z β L(s)) + Q(s) dq dλ 0 (s) = z ez β q z (s) G G e z β L(s) (e z β 0 G (e z β 0 Λ 0 (s)) (e z β G ((e z β L(s)) dl dλ 0 (s)) subject to the initial/terminal conditions L(0) = 0, Q(τ 0 ) = 0 Slud and Vonta (2002) show that these ODE s have unique solutions, smooth with respect to β and differentiable in t, which (with λ β L ) minimize the functional K(β, λ β ) as desired. 11

Consistent preliminary estimators λ β can be developed by substituting for β 0, Λ 0 in those equations (smoothed with respect to t) consistent preliminary estimators. Punchline: new estimator β = arg max β loglik(β, λ β ) is efficient! Software for these estimators so far not ready for prime time because of need for general-purpose twopoint boundary value problem for ODE, but has been used to generate formulas for Semiparametric Information. That is technically easier because it only involves the adjoint ODE system obtained by differentiating the one above at the true values (β 0, λ 0 ) with respect to a parameter Q(0). 12

Censored Linear Regression Censored linear regression model (usually, for log-survival times) assumes ɛ i independent of (Z i, C i ) in X i = β tr Z i + ɛ i, Z i and ɛ i independent Data: iid triples (T i, Z i, i ), T i = min(x i, C i ), i = I [Xi C i ] Unknown parameters: β, λ(u) F ɛ(u)/(1 F ɛ (u)) Step 1. Preliminary estimator β0 as in Koul-Susarlavan Ryzin (1981) by regression i T i / ŜC Z(T i Z i ) on Z i Generally, Ŝ C Z a kernel-based nonparametric regression estimator (Cheng, 1989). If Z i, C i independent, use Kaplan-Meier ŜC KM (T i ). Step 2. λ0 estimated by kernel-density variant of Nelson-Aalen estimator (Ramlau-Hansen 1983), with kernel cdf A( ), bandwidth b n slowly enough: 13

λ 0 (w) = 1 b n A ( w u i dn i (u + Z ) β i 0 ) b n i I [Ti u+z i β 0 ] = n i A ( w T i + Z i β 0 ) ) / n b n j=1 I [Tj T i +(Z j Z i ) β 0 ] Step 3. Next use K-L functional to find: λ β (t) z q z(t+z β) λ 0 (t+z (β β 0 )) / z q z(t+z β) Step 4. Then define λβ by substituting λ 0 into n A( T i t Z iβ ) b λ 0 (t+z i(β β 0 )) / n n A( T i t Z iβ b n ) and Λ β by numerical integral of λβ over [0, t]. Step 5. Finally substitute these expressions into loglik = n { j log λ β (T j Z jβ) Λ β (T j Z jβ)} and maximize numerically. 14