Smoothing the Nelson-Aalen Estimtor Biostat 277 presentation Chi-hong seng Reference: 1. Andersen, Borgan, Gill, and Keiding (1993). Statistical Model Based on Counting Processes, Springer-Verlag, p.229-255 2. Muller and Wang (1994). Hazard Rate Estimation Under Random Cnsoring with Varying Kernels and Bandwidths, Biometrics 50, 61-76 1 Survival Data and Nelson-Aalen Estimator 1. Survival time 2. Censoring time C 3. Oservational time X = min(, C) 4. Censoring indicator D = I( C) 5. Hazard function: α(t) = lim Pr(t < t + t t) t 6. Cumulative hazard function : A(t) = t 0 α(u)du 7. Nelson-Aalen estimator Â(t) = D i n i t 0 1 0 i t 8. n i = numer of suject at risk at time i 9. at-risk process Y i (t) = I(X i t), Y (t) = Y i (t) 10. counting process N i (t) = I(X i t, D i = 1), N(t) = N i (t) 11. Nelson-Aalen estimator Â(t) = t 12. J(s) = I(Y (s) > 0) 2 Kernel Function Estimator 0 J(u) n Y (u) dn(u) = he kernel function estimator is given y, ˆα(t) = K( t s )dâ(s) = j i=1 K( t j )(Y ( j )) 1. K: Kernel function, 1 K(s)ds = 1 e.g. K(x) = 0.75(1 x 2 ), x 1. 2. : andwidth 3. at each t, {j, t j t + } contriute to the sum. 1 t 0 J(u) Y (u) dn i(u)
4. Eˆα(t) = Consider α (t) = α(t) K( t s )Pr(Y (s) > 0)dA(s) = = K( t s )da (s) K( t s )J(s)dA(s) (almost = Eˆα(t)) K( t s )da(s) we can see that ˆα(t) α (t) is a martingale: ˆα(t) α (t) = = K( t s )d(â A )(s) K( t s and its expected predictale variation process: ) J(s)dM(s). Y (s) τ 2 (t) = E(ˆα(t) α (t)) 2 = 2 K 2 ( t s )E( J(s) Y (s) )α(s)ds 1 = K 2 J(t u) (u)e( )α(t u)du Y (t u) therefore an estimator of variance of ˆα(t): ˆτ 2 (t) = 2 K 2 ( t s ) J(s) Y 2 (s) dn(s) 3 Mean Intergrated Squared Error and Optimal Bandwith o otain the optimal gloal andwidth, we choose the one minimizing the mean intergrated squared error(mise), which is defined as, MISE(ˆα(t)) = E t 2 (ˆα(t) α(t)) 2 dt = E t 2 [(ˆα(t) α(t)) + ( α(t) α(t))] 2 dt = t 2 E(ˆα(t) α(t)) 2 dt + t 2 ( α(t) α(t)) 2 dt +2 t 2 (Eˆα(t) α(t))( α(t) α(t))dt = ( variance term) + (squared ias term) + R 2 (t) 2
cumulative hazard 0.0 0.5 1.0 1.5 BM data 0 100 200 300 400 500 600 700 time andwidth=100 smoothed hazard 0.000 0.001 0.002 0.003 0.004 0.005 0 100 200 300 400 500 600 700 time 3
smoothed hazard 0.000 0.001 0.002 0.003 0.004 0.005 andwidth=50 0 100 200 300 400 500 600 700 time andwidth=150 smoothed hazard 0.000 0.001 0.002 0.003 0.004 0.005 0 100 200 300 400 500 600 700 time 4
Let with Assume t2 t2 R 2 = 2 (Eˆα(t) α(t))( α(t) α(t))dt = 2 R 1 (t)( α(t) α(t))dt R 1 (t) = (Eˆα(t) α(t)) = K(t s )Pr(Y (s) = 0)dA(s) = 1 K(u)Pr(Y (t nu) = 0)α(t n u)du R 1 (t) C 1 sup t [t1 c,t 2 +c] Pr(Y (t) = 0) R 2 (t) C 2 sup t [t1 c,t 2 +c] Pr(Y (t) = 0) 1. 1 K(u)du = 1, 1 uk(u)du = 0, 1 u2 K(u)du = k 2 > 0 2. there exist a sequence {a n }, a n as n (in most case, a n = n) 3. there exists a continuous function y such that E( a2 nj(t) Y (t) ) 1 y(t) uniformly on [ c, t 2 + c], as n 4. sup t [t1 c,t 2 +c] Pr(Y (t) = 0) = o(a 2 n ) Since α(t) α(t) = 1 K(u)(α(t nu) α(t))du = n α (t) 1 uk(u)du + 1 2 2 nα (t) 1 u2 K(u)du + o( 2 n) = 1 2 2 nα (t)k 2 + o( 2 n) the squared ias term = t 2 ( α(t) α(t)) 2 dt = 1 4 4 nk2 2 t2 (α (t)) 2 dt + o( 4 n) For the variance term: E(ˆα(t) α(t)) 2 = E(ˆα(t) α (t)) 2 + E(α (t) α(t)) 2 +2E(ˆα(t) α (t))(α (t) α(t)) the first term on the right-hand side (see ˆτ 2 (t)) can e expressed y, (a 2 n n ) α(t) y(t) 1 and the second term on the right-hand side: K 2 (u)du + o((a 2 n n ) ) α (t) α(t) = 1 K(u)I(Y (t n u) = 0)α(t n u)du = o(a 2 n ) 5
y Cauchy -Schwarz inequality, the last term on the right-hand side is of the form o((a 2 n 1/2 n ) ). so the variance term can e written as, t2 All together, E(ˆα(t) α(t)) 2 dt = (a 2 n n) MISE(ˆα) = 1 4 4 nk 2 2 t2 1 (α (t)) 2 dt+a 2 n n K 2 (t)dt 1 t2 t2 K 2 (t)dt α(t) y(t) dt + o((a2 n n) ) α(t) y(t) dt+o(4 n)+o((a 2 n n ) ) to minimize the sum of the first two terms, the optimal andwidth is [ 1 t2 ] 1/5 [ n = an 2/5 k 2/5 2 K 2 α(t) t2 (t)dt y(t) dt (α (t)) 2 dt wo quantities are remained to estimated: α (t) and t 2 α(t) y(t) dt ˆα (t) = 3 K ( t s )dâ(s) and, an estimator of t 2 α(t) y(t) dt is ] /5 Because the ias is a 2 n t2 dâ(t) Y (t) dt. E(ˆα(t)) α(t) 1 2 2 α (t)k 2 the ias-corrected estimate is ˆα(t) + 1 2 2 nˆα (t)k 2 3.1 Cross Validation Method We can expressed the MISE, MISE() = E t2 (ˆα(t)) 2 dt 2E t2 ˆα(t)α(t)dt + t2 (α(t)) 2 dt, it s a function of ; the first term is easy to caluclate, the third term deos not depend on, and a consistent estimator of the second term is the cross -validation estimator: 2 i j 1 K( i j ) N( i) N( j ) Y ( i ) Y ( j ) and plotting MISE() against will find the optimal 6
3.2 Boundary Correction and Local Bandwidth Now we allow oth andwidth = (t) and kernel K = K t depend on t: ˆα(t, (t)) = (t) j K t ( t j (t) )(Y ( j)) Assuming the support of the survival time distriution is on [0, R]. interior points I = t : (t) t R (t) left oundary region B L = t : 0 t < (t) right oundary region B R = t : R (t) < t R the kernels K t are polynomials K t (z) = { K + (1, z) t I, K + (t/(t), z) t B L, K ((R t)/(t), z) t B R, where K ± : [0, 1] [, 1] R are ounded kernel function with K ± (q, z)dz = 1, K ± (q, z)zdz = 0, K ± (q, z)z 2 dz 0, K (q, z) = K + (q, z) and some differentiaility conditions. e.g, K + (q, z) = 12 (1 + q) 4(z + 1)[z(1 2q) + (3q2 2q + 1)/2] and K + (1, z) = 3 4 (1 x2 ) is the Epanechnikov kernel For local andwidth, we consider to minimize local mean square error to otain andwidth (t) MSE(t, (t)) = ˆv(t, (t)) + ˆβ 2 (t, (t)) ˆv and ˆβ are variance and ias estimatrs, ˆv(t, (t)) = 1 K 2 ˆα(t (t)u) t (u)( n(t) ŷ(t (t)u) )du ˆβ(t, (t)) = ˆα(t (t)u)k t (u)du ˆα(t) he following is the recommended algorithm to otain the α(t) with (t). 7
1 choose K + (q, z) and initial andwidth 0 to otain ˆα(t) 1a alternatively, choose a parametric model and otain MLE ˆα(t) 2 choose equidistant grid of m l points t i, i = 1 m l, etween 0 and R. choose a grid of andwidth j, j = 1,, l, etween some 1 and 2. compute ˆv(t i ) and ˆβ(t i ) with all andwidth j s and otain minimizer (t i ) of MSE(t i ). 3 otain final andwidth using oundary -modified smoother with andwidth 0 or 3 2 0 ˆ(t) = Kt ( t t i ) (t i )/ K t ( t t i ) 0 0 4 otain final estimate ˆα(t) with (t) = ˆ(t) 4 Large Sample Properties Now we discuss the large sample properties of the kernel function estimator with gloal andwidth. heoremn IV.2.1 pointwise consistency Assume 1. t e an interior point of 2. α is continuous at t 3. andwith n 0 as n 4. there exist an ǫ > 0 such that inf s [t ǫ,t+ǫ] n Y (s) p as n hen ˆα p α as n Proof. ˆα(t) α(t) ˆα(t) α (t) + α (t) α(t) + α(t) α(t) we have to show ˆα(t) α (t) p 0 α (t) α(t) p 0 α(t) α(t) p 0 8
Pr( ˆα(t) α (t) > η) Pr( sup n t n s t+ n 1 δ η 2 + Pr( n y Lenglart s inequality K( t u ) J(u)dM(u) t n n Y (u) > η) s K 2 (u) J(t nu)α(t n u)du ) > δ) Y (t n u) since α and K are ounded, and n Y (t) p in a neighorhood of t, the last term on the right-hand side can e aritrary small. so ˆα(t) α (t) p 0 α (t) α(t) 1 α(t) α(t) K(u) {1 J(t n u)}α(t n u)du p 0 1 K(u) α(t n u) α(t) du p 0 heoremn IV.2.2 uniform conisitency Assume 1. t e an interior point of 2. 0 < < t 2 < t e fixed numers 3. α is continuous on [0, t] 4. the kernel K is of ounded variation 5. andwith n 0 as n 6. 2 n t 0 J(s)α(s)ds Y (s) p 0 7. t 0 (1 J(s))α(s)ds p 0 hen as n, Proof. sup ˆα(s) α(s) p 0 s [,t 2 ] ˆα(t) α (t) = K( t s )d(â A )(s) 2 V (K) sup Â(s) t [0,t] A (s) where V (K) is the toal variation of K(t). similar to the proof of the consistency of Nelson-Aalen estimator, with assumption (6) the last term converges to zero. 9
sup t [t1,t 2 ] α (t) α(t) p 0 with assumption (7) sup t [t1,t 2 ] α(t) α(t) p 0 follows y the oundedness of K and continuity of α. Remark Sufficient conditions for consistency uniform consistency of Nelson Aalen estimator : inf s [0,t] Y (s) p pointwise consistency of kernel function estimator : inf s [0,t] Y (s) p uniform consistency of kernel function estimator : inf s [0,t] 2 Y (s) p heorem IV.2.4 Asymptotic normality Assume 1. t e an interior point of 2. α is continuous at t 3. there exits positive constants {a n }, increasing to as n, 4. n 0, a 2 n n as n 5. there exists a function y, positive and continuous at t such that hen as n, for an ǫ sup a 2 n Y (s) y(s) p 0 s [t ǫ,t+ǫ] 1. a n 1/2 n (ˆα(t) α(t)) d N(0, τ 2 (t)), τ 2 (t) = α(t) y(t) 2. a 2 n nˆτ 2 (t) p τ 2 1 3. for t 2, ˆα( ) and ˆα(t 2 ) are asymptotic independent. Proof. A. a n n 1/2 (ˆα(t) α (t)) = with H(s) = a n n /2 K( t s ) J(s) n Y (s). Using Reolledo s martingale CL, H(s)dM(s), K 2 (u)du inf H 2 (s)y (s)α(s)ds = a2 n n K 2 ( t s n ) J(s)α(s)ds Y (s) = 1 a2 nk 2 (u) J(t nu)α(t n u)du Y (t n u) p α(t) y(t) 1 K2 (u)du, 10 as n
With any ε > 0, I( H(s) > ε) = I( K( t s ) J(s)a2 n n Y (s) > εa nn 1/2 ) converges to zero uniformly ecause a n 1 n /2, therefore Also inf H2 (s)y (s)α(s)i( H(s) > ε)ds 0 n a n 1/2 n (α (t) α(t)) 0 B. since a n n 1/2 (ˆα(t i ) α (t i )) = with H i (s) = a n n /2 K( t i s ) J(s) Y (s) therefore, H i (s)dm(s), H 1(s)H 2 (s)y (s)α(s)ds = a2 n n K( s n )K( t 2 s n ) J(s)α(s)ds Y (s) 0 heorem IV.2.5 Assume all conditions in heorem IV.2.4, and 1. α is twice continuously differentiale in a neighourhood of t 2. 1 K(u)du = 1, 1 uk(u)du = 0, 1 u2 K(u)du = k 2 > 0 3. limsup n a 2/5 n n < hen a n 1/2 n (ˆα(t) α(t) 2 2 nα (t)k 2 ) d N(0, τ 2 (t)) Proof. With aylor expansion (t is etween t n u and t) a n n 1/2 ( α(t) [ α(t) 2 2 nα (t)k 2 ) ] = a n n 1/2 1 K(u)α(t nu)du α(t) 2 2 n α (t)k 2 = 1 2 a nn 2/5 [ ] u 2 K(u)α(t )du α (t)k 2 0 11