ESTIMATION IN A SEMI-PARAMETRIC TWO-STAGE RENEWAL REGRESSION MODEL

Statistica Sinica 19(29): Supplement S1-S1 ESTIMATION IN A SEMI-PARAMETRIC TWO-STAGE RENEWAL REGRESSION MODEL Dorota M. Dabrowska University of California, Los Angeles Supplementary Material This supplement has two sections. In Section S1, we give a derivation of the information bound for estimation of the Euclidean component of the model. In Section S2, we provide the proof of Lemma A.1 as well as some details of the Hoeffding expansion used in the proof of Propositions 3.1 and 3.2. S1.Information bound In the following the Euclidean parameter of interest is denoted by ξ = (β,θ), and we let η = (A,A c,a 1c,A 2c,µ) be the nuisance parameter. We let ξ and η = (A (),A () c,a() 1c,A() 2c,µ() ) be the true parameter values. Without loss of generality, we assume that the cumulative hazards of the censoring distributions are absolutely continuous with respect to (A () ic,i =,1,2) and denote the corresponding derivatives by (α ic,i =,1,2). Likewise, we assume that the distribution µ of the covariates has density m with respect to the true distribution µ (). The condition 2.1 (ii) implies that for h H c we have E [Y h(u) Z = z] = S (u z)g (u z), where G (x z) is the conditional survival function of C 1 given Z = z. Its cumulative hazard function is A () c and we have Q h (x 1,z) = E N h (x 1 )1(Z z) = x1 E [Y h (u) Z = z]a () h (du z)µ() (dz) for h H c. The condition 2 (iii) implies also that for h Hc i,i = 1,2 we have E [Y h (u) Z = z,x 1 = x 1,J = i,δ 1 = 1] = S i (u x 1,z)G i (u x 1,z), where G i (u x 1,z) = P(C 2 X 1 > u Z = z,j = i,x 1 = x,x 1 C 1 ) is the survival function corresponding to A () ic. It follows that

S2 DOROTA M. DABROWSKA Q h (x 2,x 1,z) = E N h (x 2 )N i (x 1 )1(Z z) = x2 x1 for h H c i. E Y h (u) X 1 = u 1,J = i,δ 1 = 1,Z = z)a () h (du u 1,z)Q i (d(u 1,z)) We shall need the following moment identities. Lemma S1.1 Let ϕ(w) = {ϕ h (X 1,Z h ),ϕ h (X 2,Z h ) : h H c,h Hi c,i = 1,2} be a vector of measurable functions. Define I h (ϕ) = ϕ h (u,z h )M h (du) if h H c. Then I h (ϕ) L q (P),q = 1,2 if and only if E ϕ h (u,z h ) q N h (du) = E Y h (u) ϕ h (u,z h ) q A () h (du,z h) < for all h H c. In addition, E I h (ϕ) =, and E I h (ϕ) 2 = E Y h (u)ϕ h (u,z h ) 2 A () h (du Z h) if h H, = E Y h (u)ϕ h (u,z h ) 2 [1 A () h ( u Z h)]a () h (du Z h) if h = (i,c),i =,1,2. For any h,h H such that h h, we also have E I h (ϕ)i h (ϕ) =. This lemma follows easily by noting that under conditions 2.1 the processes x ϕ h (u,z)m h (du),x,h H c form orthogonal martingales with respect to the filtration F x = σ(n h (u),z,y h (u+) : u x,h H c ), and similarly, the processes x ϕ h (u,z,x 1 )M h (du),x,h Hi c,i = 1,2 form orthogonal martingales with respect to the filtration F ix = σ(n h (u),y h (u+),z1(δ 1 = 1,J 1 = i),x 1 1(δ 1 = 1,J 1 = i) : u x,h Hi c ). Using direct calculation, it is also easy to verify that the processes are orthogonal. As in Nan, Edmond and Wellner (24), in the following we assume that the censoring distributions are continuous. The condition 2.1 implies that each

SEMI-PARAMETRIC RENEWAL MODEL S3 subect contributes to the complete data log-likelihood the sum L(ξ,α) + L c (α ic,i =,1,2) + L Z (m), where the three terms are given by L(ξ,α) = h H + h H h H [log q h (Z h,u,θ) + r(z,u,β)]n h (du) + log α(u)n h (du) Y h (u)q h (Z h,u,θ)e r(z,u,β) α(u)du, L c (α ic,i =,1,2) = i= i= L Z (m) = m(z). log(α ic (du Z ic ))N ic (du) Y ic (u)α ic (u Z ic )A () ic (du Z ic), The score function for the parameter of interest ξ and the nuisance parameter η are l 1[ξ](W) = l 2 [a](w) = l 3[b c ](W) = l 4[b 1c ](W) = l 5 [b 2c](W) = h H h H ϕ h (u,z h,ξ )M h (du), a(u)m h (du), a L 2 ( h H(Q h ), b c (u,z)m ic (du), b c L 2 (Q c ), b 1c (u,z 1c )M 1c (du), b 1c L 2 (Q 1c ), b 2c (u,z 2c )M 2c (du), b 2c L 2 (Q 2c ), l 6[c](W) = c(z), c L 2(µ () ). Here l 1 [ξ] is the derivative of the log-likelihood with respect to the Euclidean parameter ξ = (θ,β), From Section 2, ϕ h (Z h,u,ξ) = [ϕ 1h (Z h,u,ξ),ϕ 2h (Z h,u,ξ)] T where ϕ 1h (Z h,u,ξ) = ṙ(z,u,β),

S4 DOROTA M. DABROWSKA ϕ 2h (Z h,u,ξ) = For h H c, Z h = Z, and we have q h q h (Z h,u,θ), h H. Further, ϕ 2h (Z h,u,ξ) = q h q h (Z,u,θ) for h = (,i),i = 1,2, = q 1 + q 2 1 q 1 q 2 (Z,u,θ) for h = (,3). a(u) = ε log α ε(u) ε=, b ic (u,z ic ) = c(z) = log α ic,ηi (du Z ic ) η ηi =,i =,1,2, i κ log m κ(z) κ= for regular parametric submodels {α ε }, {α ic,ηi,i =,1,2} and {m κ } passing through the parameters α, α ic and m when ε = κ = η i =,i =,1,2. The tangent spaces are Ṗi = [ l i ],i = 1,...,6 and [α] is the closed linear span generated by the score α. Using Lemma S1.1, it is easy to see that the spaces Ṗ i,i = 2,...,6 are mutually orthogonal, and Ṗ1 is orthogonal to Ṗi,i = 3,...,6. The following proposition generalizes Propositions 3.1 and 3.2 in Nan, Edmond and Wellner (24). Proposition S1.2 (i) Define Ṗ = { h H c Then Ṗ = L 2 (P). g h (u,z h )M h (du)g (Z) : g h L 2 (Q h ),h H c,g L 2 (m)}. (ii) Let c(w) = {c h (X 1,Z h ),c h (X 2,Z h ) : h H,h H i,i = 1,2} be a vector of functions such that c h L 2 (Q h ) for each h H. Define B(c) = [c h e c ](u)m h (du) h H by setting e c (u) = s (1) c (u)/s (u), where s (1) c (u) = h H E Y h (u)c h (u,z h )q h (u,z h,θ )e r(z,u,β ),

SEMI-PARAMETRIC RENEWAL MODEL S5 s (u) = h E Y h (u)q h (u,z h,θ )e r(z,u,β ). Then B(c) Ṗη L 2 (P) and Ṗ η = ( 6 Ṗ) = {B(c) : c h L 2 (Q h ),h H}. When specialized to the renewal model considered in the paper, this Proposition implies that the efficient score function of estimation of the parameter ξ = (θ,β) corresponds to the choice of c(w) = {ϕ h (Z h,u,ξ) : h H}, which is quite similar to the standard Cox regression. Note, however, that the model assumes that the functions r(z,u,β) and q h (Z h,u,θ),h H have support not depending on the unknown parameters. In the absence of covariates, this assumption fails for instance in the parametric Marshall- Olkin model (1967), where n rate of convergence of the asymptotically efficient estimators applies only to the interior of the parameter set. Proof. Recall that Z h = Z, for h H c and Z h = (X 1,Z) for h Hi c,i = 1,2. Any function f(w) L 2 (P) can be represented as a sum For = 1,2 define f(w) = δ 1 δ 2 1(J 1 = )f 3 (X 2,X 1,Z) + δ 1 (1 δ 2 ) 1(J 1 = )f c (X 2,X 1,Z) + δ 1 1(J 1 = 3)f 3 (X 1,Z) + (1 δ 1 )f c (X 1,Z). f (X 1,Z) = E [δ 2 f 3 (X 2,X 1,Z) + (1 δ 2 )f c (X 2,X 1,Z) X 1,Z,δ 1 = 1,J = ]. Then f(w) = I 1 (W) + I 2 (W), 3 I 1 (W) = δ 1 1(J = )f (X 1,Z) + (1 δ 1 )f c (X 1,Z), I 2 (W) = δ 1 δ 2 1(J = )f 3 (X 1,X 2,Z) + (1 δ 2 )δ 1 1(J = )f c (X 1,X 2,Z) δ 1 1(J = )f (X 1,Z).

S6 DOROTA M. DABROWSKA Set 3 R 1 [f](x 1,z) = E [δ 1 f (X 1,Z) X 1 > x 1,Z = z] + E [(1 δ 1 )f c (X 1,Z) X 1 > x 1,Z = z] and R 2 [f](x 2,x 1,z) = E [δ 2 f 3 (X 2,X 1,Z) X 2 > x 2,X 1 = x 1,Z = z,j 1 =,δ 1 = 1] + E [(1 δ 2 )f 1 (X 2,X 1,Z) X 2 > x 2,X 1 = x 1,Z = z,j 1 =,δ 1 = 1] for = 1,2. Finally, let g (Z) = E f(w) Z, g h (X 1,Z) = f (X 1,Z) R 1 [f](x 1,Z), h H c, g h (X 2,X 1,Z) = f h (X 2,X 1,Z) R 2 [f](x 2,X 1,Z), h H c, = 1,2. We have h H R 1 [f](x 1,Z) g (Z) = c x 1 f h (u,z)q h (du,z) 1 h H c Q g (Z) h(du,z) = x1 [f h (u,z) R 1 [f](u,z)]a () h (du Z). Therefore I 1 (W) = h H c 3 δ 1 1(J = )f (X 1,Z) + (1 δ 1 )f c (X 1,Z) R 1 [f](x 1,Z) ( ) + R 1 [f](x 1,Z) g (Z) + g (Z) = g h (u,z)m h (du) + g (Z) = g h (u,z h )M h (du) + g (Z) h H c h H c Similarly, for = 1,2, R 2 [f](x 2,X 1,Z) f (X 1,Z) = x2 [f h (u,x 1,Z) R 2 [f](u,x 1,Z)]A () h (du X 1,Z) h H c

SEMI-PARAMETRIC RENEWAL MODEL S7 and δ 1 δ 2 1(J = )f 1 (X 2,X 1,Z) + (1 δ 2 )δ 1 f c (X 1,X 2,Z) = = δ 1 δ 2 1(J = )f 1 (X 2,X 1,Z) + (1 δ 2 )δ 1 f c (X 1,X 2,Z) R 2 [f](x 2,X 1,Z) ( ) + R 2 [f](x 2,X 1,Z) f (X 1,Z) + f (X 1,Z) = g h (u,x 1,Z)M h (du) + f (X 1,Z)δ 1 1(J = ). h H c Summing over we obtain I 2 (W) = g h (u,x 1,Z)M h (du) = h H c h H c g h (u,z h )M h (du). It follows that f(w) Ṗ. To show part (ii) of the proposition, let b(w) = c h dm h. h H Then Π(b Ṗ2) = h a (u)dm h (u) for a square integrable function a such that E ( adm h )( [c h a ]dm h ) = h H h H for any a L 2 ( h H Q h). Orthogonality implies that the left side is equal to a s )A(du) (as (1) c so that a = s (1) c /s. The proof can be completed as in Nan et al. (24) and using that B(c) is orthogonal to 6 =3 Ṗ. S2. Proof of Lemma A.1 Before showing this lemma, we recall that a class of functions G defined on some measure space (Ω, A) is Euclidean for envelope G if g G for all g G, and there exist constants A and V such that N(ε G L2 (P), G, L2 (P)) (A/ε) V for all ε (,1) and all probability measures P such that G L2 (P) <. Here L2 (P) is the L 2 (P) norm and N(η, G, L2 (P)) is the minimal number of L 2 (P) balls of radius η covering the class G. In the case of classes G n changing with n, the Euclidean constants A and V are taken to be independent of n.

S8 DOROTA M. DABROWSKA It is easy to verify that the condition E G(W m ) p < implies E G p (W m )1(G(W m ) n m/p ), n m(2 p)/p E G 2 (W m )1(G(W m ) < n m/p ). Let g 1n (W m ) = g(w m )1(G(W m ) < n m/p ), g 2n (W m ) = g(w m )1(G(W m ) n m/p ). Since g is a canonical kernel, we have U n,m (g) = U n,m (π m [g 1n ]) + U n,m (π m [g 2n ]). Define G 1n (W m ) = A G 2n (W m ) = A E A G(W m )1(G(W m ) < n m/p ), E A G(W m )1(G(W m ) n m/p ). Then G 1n (W m ) and G 2n (W m ) form envelopes for the classes of truncated canonical kernels G 1n = {π m [g 1n ] : g G} and G 2n = {π m [g 2n ] : g G}, respectively. We have E n m(p 1)/p) sup U n,m (π m [g 2n ]) n m(p 1)/p E G 2n (W m ) g 2n G 2n Cn m(p 1)/p E G(W m )1(G(W m ) n m/p ) CE G p (W m )1(G(W m ) n m/p ) and the right-hand side tends to. We also have E n m(p 1)/p sup U n,m (π m [g 1n ]) = E n m/2 sup U n,m (π m [g 1n /n m(2 p)/2p ]). g 1n G 1n g 1n G 1n The class G 1n is Euclidean for the envelope G 1n (W m ). By the randomization Theorem 3.5.3 and Corollary 5.1.8 of de la Peña and Giné (1999) or using similar developments as on page 254-255 of their text, the right-hand side of the above display is of order KE U n,m (G 2 1n (W m)/n m(2 p)/p ) K[E [G 2 1n (W m)/n m(2 p)/p ] 1/2 = KC[n m(2 p)/p E [G 2 (W m )1(G(W m )] 1/2.

SEMI-PARAMETRIC RENEWAL MODEL S9 Here K is a constant depending only on m and the VC-characteristics of the class G 1n, but not n, while C is bounded by 2 m. This completes the proof. Finally, we give some more details of the Hoeffding decomposition used in the proofs of Lemmas A.2 and A.3. In calculations given below, we use A = A, the true baseline cumulative hazard function. In Lemma A.2, we consider first the statistic U n,2 (g (2) ). For i, we have E g (2) (W i,w ) =, E {1} g (2) (W i,w ) = E [g (2) (W i,w ) W i ] = h τ [ ] ϕ ih s () s (1) (u,ξ )N hi (du), s () E {2} g (2) (W i,w ) = E [g (2) (W i,w ) W ] = = [ τ (s (1) S () ) S (1) ] s () h s () (u,ξ )A(du) Thus U n,2 (g (2) ) = U n,1 (E {1} g (2) ) + U n,1 (E {2} g (2) ) + U 2,n (π 2 [g (2) ]) = 1 τ ϕ ih s () s (1) n s () (u,ξ )M ih (du) + U 2,n (π 2 [g (2) ]). i h Next, we consider the statistics U n,3 (g (3) ). For triplets (i,,k) of distinct indices, we have E g (3) (W i,w,w k ) =. In addition, E {13} g (3) (W i,w,w k ) = E g (3) (W i,w,w k ) W i,w k = [ ] τ ϕ ih s () s (1) S () k s () h s () s () (u,ξ )N ih (du) E {23} g (3) (W i,w,w k ) = E g (3) (W i,w,w k ) W,W k [ τ s (1) S () S (1) ] s () = s () ( S() k s () s () ) (u,ξ )A(du) and E A g (3) (W i,w,w k ) = for all other proper subsets of the index set {1,2,3}. Hence U n,3 (g (3) ) = U n,2 (E {13} g (3) ) + U n,2 (E {23} g (3) ) + U n,3 (π 3 [g (3) ]). Finally, for any quadruplet (i,,k,l) of distinct indices, we have E g (4) (W i,w,w k,w l ) =, and E 134 g (4) (W i,w,w k,w l ) = E [g (4) (W i,w,w k,w l ) W i,w k,w l ]

S1 DOROTA M. DABROWSKA = h τ [ ϕ h s () + s (1) ] s () k s () ( S() s () E 234 g (4) (W i,w,w k,w l ) = E [g (4) (W i,w,w k,w l ) W,W k,w l ] = τ [s (1) S () + s () S (1) ] s () k s () ( S() s () E 34 g (4) (W i,w,w k,w l ) = E [g (4) (W i,w,w k,w l ) W k,w l ] = 2 τ [s (1) ( S() k s () s () )( S() s () l s () )N hi (du), )( S() s () l s () )A(du) )( S l s s () )A(du), and E A g (4) = for all other proper subsets of the index set {1,2,3,4}. Hence U n,4 (g (4) ) = U n,2 (E {34} g (4) ) + A={134},{234} U n,3 (π 3 [E A g (4) ]) + U n,4 (π 4 [g (4) ]). In Lemma A.3, the Hoeffding decomposition of the statistics U n,p (f (p) ξ ),p = 2,3,4 is quite analogous. On the other hand the remainder term is given by rem(ξ) = 3 p=1 rem p(ξ), where rem 1 (ξ) is given by (4.3), and rem 2 (ξ) = 1 n τ [ h ϕ h(u,ξ)(s () h (u,ξ) S() h (u,ξ )) n i=1 S () (u,ξ ) ] (ξ ξ ) T S(2) (u,ξ ) S () N i (du), (u,ξ ) rem 3 (ξ) = 1 n τ [ ( ) S (1) S () (u,ξ) n i=1 S ()(u,ξ) S () (u,ξ ) 1 ( ) 2 (ξ ξ ) T S (1) (u,ξ )] N i (du). S () These two terms are easily verified to be order o p ( ξ ξ ).