Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased sampling scheme Here, we provide the justification of (11) which is introduced in Section 3. Observe that f T, (t, = 1 A, Z) = P T (t, t + dt), = 1 A, Z/dt = I(t A)f A,V (A, t A Z)S c (t A Z) = I(t A)f(t Z)S c (t A Z) 1 µ(z) = I(t A)P (T (t, t + dt), C > t Z)/dt 1 µ(z) 1 = I(t A)f T, (t, = 1 Z) µ(z), f T, (t, = A, Z) = P ( T (t, t + dt), = A, Z)/dt = I(t A) t f A,V (A, s A Z)ds g c (t A Z) 1 = I(t A) f(s Z)ds g c (t A Z) t µ(z) = I(t A)P T > t, C (t, t + dt) Z/dt 1 µ(z) 1 = I(t A)f T, (t, = Z) µ(z). 1

It follows that that v i (t) = f T, ( T i, i Z i ) f T, ( T i, i Z i ) f T, (t, 1 Z i ) f T, (t, 1 Z i) = I(A i t). S2 Proof of Theorems 1 and 2 For the biased samplings introduced in Sections 2.2 and 3, let f T (t Z) be the conditional density function of T, i.e., f T (t Z) = δ,1 f T, (t, δ Z), where f T, (t, δ Z) is defined in (8) of Section 2.2. For a vector a, let a 2 denote aa, and a denote the Euclidean norm of a. For b R p, define m(b) = E ZN(e Z b ), m(b) = E Zv(e Z b )Y (e Z b ), m n (b) = 1 n Z i N i (e Z i b ), n m n (b) = 1 n Z i v i (e Z i b )Y i (e Z i b ), n B(b) = E Z 2 f T, (e Z b, 1 Z) exp(z b), J(b) = E Z 2 v(e Z b )f T (e Z b Z) exp(z b). For d >, define the set B(d) as B(d) = b R p : inf m(b) m(β (τ)) d, τ (,τ u] where β (τ) is the true parameter value, τ (, τ u ]; and τ u satisfies Condition C4 below. 2

Furthermore, consider the setting in Section 3.2. For b R p, define m (b) = E Z ψ( T )N(e Z b ), m (b) = E Z ψ(e Z b )v(e Z b )Y (e Z b ), m n(b) = 1 n Z i ψ( n T i )N i (e Z i b ), m n(b) = 1 n Z i ψ(e Z i b )v i (e Z i b )Y i (e Z i b ), n B (b) = E [Z ψ(e Z b )]Z f T, (e Z b, 1 Z) exp(z b), J (b) = E [Z ψ(e Z b )]Z v(e Z b )f T (e Z b Z) exp(z b). We assume the following conditions for theoretical derivation. C1: Z is uniformly bounded, i.e., sup i Z i < and the matrix E(Z 2 ) is positive definite. C2: mβ(τ) is a Lipschitz continuous function of τ (, τ u ], and f T (t z) and f T, (t, 1 z) are bounded above and continuous uniformly in t and z. C3: There exists d > such that for b B(d ) and any Z, f T, (e Z b, 1 Z) > and J(b)B(b) 1 is uniformly bounded. C4: inf τ [τl,τ u] eigmin Bβ (τ) > for any τ l (, τ u ], where eigmin( ) denotes the minimum eigenvalue of a matrix. C5: The class of weight functions (Z, A, T, ) v(e Z b, A, T,, Z); b R d is Glivenko- Cantelli and uniformly bounded; and (Z, A, T, ) v(e Z b, A, T,, Z); b B(d ) belongs to a Donsker class. C6: The weight function ψ(t) is positive, bounded and differentiable; (Z, A, T, ) ψ(e Z b, A, T,, Z); b R d is Glivenko-Cantelli and (Z, A, T, ) ψ(e Z b, A, T,, Z); b B(d ) belongs to a Donsker class. Matrix W (β, τ) is nonsingular, inf t [τl,τ] eigmin B β (t) 3

W (β, τ) 1 B β (t) > for any τ l (, τ] and J (b)b (b) W (β, τ) 1 B (b) 1 is uniformly bounded for b B(d ). Conditions C1 C4 are mild assumptions concerning the covariates Z, the underlying regression quantile parameter process β(τ), and the density functions associated with the observed data ( T, ). The boundedness assumption of covariates in C1 are often assumed in survival models although it can be further relaxed with extra technical complexity. For Conditions C2 and C3, the boundedness and uniform continuity assumption of f T (t z) and f T, (t, 1 z) is reasonable in many biased sampling problems. For instance, it is satisfied for case-cohort study if the density functions of survival and censoring times are bounded and uniformly continuous. Similarly, for the length-biased sampling introduced in Section 3, this is satisfied if the density functions of T and C are bounded and uniformly continuous. Under these conditions, it can been verified that m(b) and m(b) are differentiable and B(b) = m(b)/ b and J(b) = m(b)/ b are well defined. As in Peng and Huang (28), Condition C4 is assumed to ensure the identifiability of β (τ). The regularity of the weight function assumed in Condition C5 is satisfied for all the sampling schemes introduced in Section 3. In particular, for the case-cohort type designs (Examples 3 and 4), the weight function v does not depend on β and satisfies C5. For length-biased sampling with weight function in the form (14), v(e Z b ), b B(d ) is a VC class, which implies C5 (Theorems 2.6.7 and 2.5.2 in van der Vaart and Wellner, 1996). Condition C6 is similar as conditions C3-C5 and is satisfied for many weight functions. It is assumed to ensure the asymptotic normality of the efficient estimator. Proof of Theorem 1 The proof follows closely that in Peng and Huang (28) and we only present the key steps. We should note that, however, their results cannot be directly applied due to the associated random weight functions in the estimating equation. Let α (τ) = mβ (τ), ˆα(τ) = mˆβ(τ), and A(d) = m(b) : b B(d). Following the argument in Peng and Huang (28), m is a one-to-one mapping from B(d ) to A(d ). There 4

exists an inverse function κ : B(d ) A(d ) such that κm(b) = b, for any b B(d ). Uniform Convergence Let ξ n,k = m n ˆβ(τ k ) k m n ˆβ(u)dH(u). According to the estimating equation taking the general form of (1), we have sup k ξ n,k = O(1) sup i Z i /n = O(n 1 ). From mβ (τ k ) k mβ (s)dh(s) =, we have the following decomposition: mˆβ(τ k ) mβ (τ k ) [ ] k 1 = m n ˆβ(τ k ) mˆβ(τ k ) + j= j+1 k [ ] + m n ˆβ(s) mˆβ(s) dh(s) + ξ n,k. τ j [ ] mˆβ(s) mβ (s) dh(s) (S1) Under the condition that Z is uniformly bounded, Z i N(e Z i b ); b R p and Z i Y i (e Z i b ); b R p are Glivenko-Cantelli. By Condition C5 and Theorem 3 in van der Vaart and Wellner (2), Z i v i (e Z i b )Y i (e Z i b ); b R p is also Glivenko-Cantelli. Then, we have the following results: sup b R p m(b) m n (b) and sup b R p m(b) m n (b) almost surely. This implies that the first and third terms in (S1) are ignorable. Denote c n, := sup k [m n ˆβ(τ k ) mˆβ(τ k )]+ k [ m nˆβ(s) mˆβ(s)]dh(s), and it follows that c n, = o p (1). Under condition C2, there exists c 1 such that mβ (τ) mβ (τ ) < c 1 τ τ for any τ, τ (, τ u ]. For = τ τ < τ 1, since mˆβ() =, we have sup τ τ<τ 1 mˆβ(τ) mβ (τ) = sup τ τ<τ 1 mβ (τ) b n, := c 1 S L(n). Then for large enough n, we know b n, < d and ˆβ n (τ) B(d ) for τ [τ, τ 1 ). Thus, we can write sup τ τ<τ 1 mˆβ(τ) mβ (τ) = sup τ τ<τ 1 m[κ ˆα(τ)] m[κα(τ)]. Under Condition C3, there exists α(τ) between α (τ) and ˆα(τ) such that sup mˆβ(τ) mβ (τ) = sup J[κ α(τ)]b[κ α(τ)] 1 α(τ) α (τ) c 2 b n,, τ τ<τ 1 τ τ<τ 1 5

where c 2 is some constant. Next consider τ 1 τ < τ 2. We have sup mˆβ(τ) mβ (τ) mˆβ(τ 1 ) mβ (τ 1 ) + sup mβ (τ 1 ) mβ (τ). τ 1 τ<τ 2 τ 1 τ<τ 2 We know sup τ1 τ<τ 2 mβ (τ 1 ) mβ (τ) c 1 S L(n). Further from the decomposition (S1) that gives an upper bound for mˆβ(τ 1 ) mβ (τ 1 ), we have: sup mˆβ(τ) mβ (τ) b n,1 := c n, + c 3 n 1 + c 2 b n, (1 τ u ) 1 S L(n) + c 1 S L(n), τ 1 τ<τ 2 where c 3 is some big constant. Note that b 1,n and it is smaller than d for large enough n. This implies ˆβ n (τ) B(d ) for τ [τ 1, τ 2 ). Then an induction argument similarly to the Proof of Theorem 1 in Peng and Huang (28) gives k 1 sup mˆβ(τ) mβ (τ) b n,k := c n, +c 3 n 1 +c 2 b n,i (1 τ u ) 1 S L(n) +c 1 S L(n). τ k τ τ k+1 This implies that sup τ τ τ u mˆβ(τ) mβ (τ) p and for τ τ τ u and large enough n, ˆβ(τ) B(d ). Thus sup τ ˆα(τ) α (τ) = sup τ mˆβ(τ) mβ (τ) p. Take Taylor s expansion of mˆβ(τ) around α (τ), and we have sup ˆβ(τ) β (τ) τ [τ l,τ u] sup Bβ (τ) 1 ˆα(τ) α (τ) + τ [τ l,τ u] sup ɛ n(τ), τ [τ l,τ u] where ɛ n(τ) is the remainder term of the Taylor expansion and sup τ [τl,τ u] ɛ n(τ). By Condition C4, we have sup τ [τl,τ u] Bβ (τ) 1 ˆα(τ) α (τ) = O ˆα(τ) α (τ) and this implies the uniform consistency. 6

Weak Convergence Following Lemma B.1 of Peng and Huang (28), we have sup n 1/2 τ (,τ u] n [ Z i N i (e Z ˆβ(τ) i ) N i (e Z i β (τ) ) n 1/2 mˆβ(τ) mβ (τ)] = o p (1) (S2) and sup n 1/2 τ (,τ u] n ( Z i I ˆβ(τ)) ( ) Ti e Z i I Ti e Z i β (τ) n 1/2 [ mˆβ(τ) mβ (τ)] = o p (1). (S3) In addition, for the grid size chosen in Theorem 1, S n (ˆβ, τ) = o p (1) uniformly in τ (, τ u ]. Then, the following equations hold uniformly in τ (, τ u ], ] S n (β, τ) =n [mˆβ(τ) 1/2 mβ (τ) [ ] n 1/2 mˆβ(u) mβ (u) dh(u) + o p (1) ] =n [mˆβ(τ) 1/2 mβ (τ) n [ 1/2 Jβ (u)bβ (u) 1 + o p (1) ] [ ] mˆβ(u) mβ (u) dh(u) + o p (1). A discrete version of the above equation ] [ ] S n (β, τ) =n [m 1/2 n ˆβ(τ) m n β (τ) n 1/2 m n ˆβ(u) m n β (u) dh(u) + o p (1) 7

leads to the following recursive formula: S n (β, τ k ) S n (β, τ k 1 ) ] [ = n [m 1/2 n ˆβ(τ k ) m n β (τ k ) n 1/2 n m 1/2 n (ˆβ(τ k 1 )) m n (β (τ k 1 )) = Bβ (τ k )n 1/2 ˆβ(τ k ) β (τ k ) ] m n ˆβ(τ k 1 ) m n β (τ k 1 ) H(τ k ) H(τ k 1 ) [Bβ (τ k 1 ) + Jβ (τ k 1 )H(τ k ) H(τ k 1 )]n 1/2 ˆβ(τ k 1 ) β (τ k 1 ) + o p (1). The above recursive equation gives the following approximation result Bβ (τ k )n 1/2 ˆβ(τ k ) β (τ k ) = S n (β, τ k ) S n (β, τ k 1 ) [I + Jβ (τ k 1 )B 1 β (τ k 1 )H(τ k ) H(τ k 1 )] S n (β, τ k 1 ) S n (β, τ k 2 ) k [I + Jβ (τ h 1 )B 1 β (τ h 1 )H(τ h ) H(τ h 1 )] S n (β, τ 1 ) S n (β, τ ) h=2 + o p (1). (S4) Using the product integration theory (Gill and Johansen, 199; Andersen et al., 1993 II.6), we can write the above decomposition as n 1/2 [mˆβ(τ) mβ (τ)] = φ S n (β, τ) + o p (1), (S5) where φ is functional defined as follows. For any g G = g : [, τ u ] R p, g is left-continuous with right limit, g() = and product integral I(s, t) = π u (s,t] [I p +Jβ (u)bβ (u) 1 ]dh(u) 8

with π the product integral notation, φ(g)(τ) is defined as φ(g)(τ) = I(s, τ)dg(s). (S6) Note that Z i N i (e Z iβ (τ) ), τ [τ l, τ u ] is a Donsker class and uniformly bounded. From Condition C5, the class Z i v i (e Z i β (s) )Y i (e Z i β (s) ), s [τ l, τ u ] is also Donsker (see Example 2.1.8 in Section 2.1 of van der Vaart and Wellner, 1996). Since v i(e Z i β (s) )Y i (e Z i β (s) )dh(s) is Libschitz in τ, from the permanence properties of the Lipschitz transformation of the Donsker class (Theorem 2.1.6, van der Vaart and Wellner, 1996), we can show that Z i N i (e Z iβ (τ) ) Z i v i (e Z i β (s) )Y i (e Z i β (s) )dh(s), τ [τ l, τ u ] is a Donsker class. Then the Donsker theorem implies that S n (β, τ) converges weakly to a Gaussian process U(τ) living on τ [τ l, τ u ] with mean and covariance matrix Σ(s, t), where Σ(s, t) = Eu i (s)u i (t) with u i (τ) = Z i N i (e Z iβ (τ) ) Z i v i(e Z i β (s) )Y i (e Z i β (s) )dh(s). Furthermore, since φ is a linear operator, φ S n (β, τ) converges weakly to φ U(τ) and φ U(τ) is also a Gaussian process (Römisch, 25). From (S5), apply Taylor s expansion and we have Bβ (τ)n 1/2 ˆβ(τ) β (τ) = φ S n (β, τ) + o p (1). Then n 1/2 ˆβ(τ) β (τ) converges weakly to the Gaussian process Bβ (τ) 1 φ U(τ) with covariance matrix Bβ (τ) 1 Σ [Bβ (τ) 1 ], where Σ (τ) denotes the limiting covariance matrix of φ S n (β, τ). Proof of Theorem 2 Next we prove the weak convergence of nˆβ eff (τ) β (τ). From the above definitions, we can write η(β, τ) and η (β, τ k ) in (18) and (2) as η(β, τ) = n 1/2 m nβ(τ) n 1/2 m nβ(u)dh(u) 9

and k 1 η (β, τ k ) = n 1/2 m nβ(τ k ) n 1/2 j= m nˆβ eff (τ j )H(τ j+1 ) H(τ j ). By the proposed estimation method for ˆβ eff (τ), the sequential estimator ˆβ(τ k ), 1 k L, minimizes n 1 η (β, τ k ) W (ˆβ int, τ) 1 η (β, τ k ). Since ˆβ int is taken as a consistent estimator of β, it can be shown that W (ˆβ int, τ) 1 W (β, τ) 1 in probability. Further note that B (b) = m (b) b and J (b) = m (b) b. As in the proof of Theorem 1, m n(b) and m n(b) converge to m (b) and m (b) uniformly in probability; then the sequential estimators ˆβ eff (τ k ) satisfy n 1/2 B ˆβ eff (τ k ) W (β, τ) 1 η (ˆβ eff, τ k ) = o p (1). Then from a similar argument as in the proof of Theorem 1 for the consistency of ˆβ( ), we have that ˆβ eff (t) is uniformly consistent in t τ. Similarly as Lemma B.1 of Peng and Huang (28), we have for β(t) such that sup t β(t) β (t) in probability, sup [m n β(t) m nβ (t)] [m β(t) m β (t)] = o p ( n 1/2 ) t (,τ] and Then we can write sup [ m n β(t) m nβ (t)] [ m β(t) m β (t)] = o p (n 1/2 ). t (,τ] η ( β, τ k ) = n 1/2 B β (τ k ) β(τ k ) β (τ k ) + o p (1) k 1 +n 1/2 m nβ (τ k ) n 1/2 1 j= m nˆβ eff (τ j )H(τ j+1 ) H(τ j ).

This implies the sequential estimator ˆβ eff (τ k ) satisfies B β (τ k ) W (β, τ) 1 η (ˆβ eff, τ k ) = o p (1), uniformly in probability, and furthermore, uniformly in t, B β (t)w (β, τ) 1 η(ˆβ eff, t) = o p (1). This equation plays a similar role in the current proof as that of the unweighted estimating equation S n (ˆβ, t) = o p (1) in the proof of Theorem 1 for the weak convergence of ˆβ. The weak convergence of ˆβ eff then follows from a similar argument. S3 Resampling method An alternative Resampling method A commonly used resampling method is the perturbationbased approach proposed by Jin et al. (23) for the AFT model; its modified version for the quantile regression was adopted in Peng and Huang (28) under unbiased sampling. This approach can be easily extended to the biased sampling case. In particular, consider the following stochastic perturbation of the observed estimating equations S n (β, τ) = n 1/2 n ζ i Z i N i (e Z i β(τ) ) v i (e Z i β(s) )Y i (e Z i β(s) )dh(s) =. (S7) where ζ i s are i.i.d. from a known nonnegative distribution with mean 1 and variance 1, such as the exponential distribution with rate 1. The above estimation can be implemented in the same fashion as discussed in Section 2.3. Conditional on the observed data, we independently generate the variates (ζ 1,..., ζ n ) M b times, with M b being a big number, and obtain the corresponding M b estimates ˆβ (τ) by solving (S7). The resampling method described above is consistent. Following the proof of Theorem 3 below, conditional on Z i, i, T i n, we have S n (ˆβ, τ) has the same asymptotic distribution 11

as S n (β, τ) in probability; moreover, for τ uniformly in (, τ u ], we have n 1/2 β (τ) ˆβ(τ) = B(β (τ)) 1 φ S n (ˆβ, τ) + o p (1), (S8) where function φ is defined as in (S6). Therefore, conditional on the observed data, n 1/2 ˆβ (τ) ˆβ(τ) converges weakly to the same limiting process of n 1/2 ˆβ(τ) β (τ) for τ [τ l, τ u ], where τ l (, τ u ). Proof of Theorem 3 We first show that, conditional on the observed data, Sn (ˆβ, τ) has the same asymptotic distribution as S n (β, τ) in probability. Consider the weighed summation defined in (S7) S n (β, τ) = n 1/2 n ζ i Z i N i (e Z i β(τ) ) v i (e Z i β(s) )Y i (e Z i β(s) )dh(s). Since E(ζ i ) = 1 and V ar(ζ i ) = 1, a similar argument as in the proof of Theorem 1 gives that sup S n (ˆβ, τ) = o p (1), τ (,τ u] where ˆβ is the solution of S n (β, τ) = using the proposed sequential algorithm. Because sup τ (,τu] S n (ˆβ, τ) = o p (1) as shown in the proof of Theorem 1, for τ uniformly in (, τ u ], S n (ˆβ, τ) can be expressed as S n (ˆβ, τ) = S n (ˆβ, τ) S n (ˆβ, τ) + o p (1) = n 1/2 n (1 ζ i )û i (τ) + o p (1), where û i (τ) = Z i N(e Z i ˆβ(τ) ) Z i v i (e Z ˆβ(s) i, T i, i )Y i (e Z ˆβ(s) i )dh(s). 12

Then we have for any s, t [τ l, τ u ], [ n n E n 1/2 (1 ζ i )û i (s) n 1/2 Zi (1 ζ i )û i (t), i, T i n n = n 1 û i (s)û i (t) Σ(s, t), ] in probability as n, where Σ(, ) is the corresponding limiting covariance function of S n (β, ) as defined in the proof of Theorem 1. Therefore, by a similar argument as in Lin, Wei, and Ying (1993), conditional on Z i, i, T i n, Sn (ˆβ, τ) has the same limiting covariance matrix and asymptotic distribution as S n (β, τ) in probability. From the decomposition in (S4), then we can show that conditional on the data, φ n S n (ˆβ, τ) has the same asymptotic distribution as φ n ( S n (β, τ)) in probability. Furthermore, since n 1/2 [mˆβ(τ) mβ (τ)] = φ n S n (β, τ)+o p (1), we obtain that conditional on the data, φ n S n (ˆβ, τ) converges weakly to the limiting distribution of n 1/2 [mˆβ(τ) mβ (τ)] in probability. Following from Zeng and Lin (28), we use the perturbation estimators ˆB and Ĵ to estimate B and J. The consistency property can be established as follows. Following Lemma B.1 of Peng and Huang (28), for β(t) such that sup t β(t) β (t) in probability, the following result holds uniformly in τ n 1/2 [m n β(τ) m n β (τ)] n 1/2 [m β(τ) mβ (τ)] = o p (1). This implies that n 1/2 [m n β(τ) m n β (τ)] = Bβ (τ)n 1/2 β(τ) β (τ) + o p (1). Therefore, the above approximation holds for β(τ) = ˆβ(τ) + n 1/2 γ, where γ follows a p- dimensional multivariate normal distribution with zero mean and identity covariance matrix. 13

For M generated γ s, let γ be the M p matrix (γ 1,, γ M ) and m be the m-dimensional vector [n 1/2 m n ˆβ(τ) + n 1/2 γ i n 1/2 m n ˆβ(τ); i = 1,, M]. Then we have Bb (τ) = (γ γ) 1 γ m + o p (1). Note that (γ γ) 1 exist with probability 1 for M > p. Therefore, the slope matrix estimator given by regressing the perturbed values n 1/2 m n ˆβ(τ) + n 1/2 γ i on γ i, i.e., ˆBˆβ(τ) = (γ γ) 1 γ m, is consistent. Similarly we have the consistency of Ĵ. This implies that ˆBˆβ(τ) 1 φ n S n (ˆβ, τ) has the same asymptotic distribution as n 1/2 ˆβ(τ) β (τ). This validates the proposed resampling method. References Andersen, P. K., Borgan, Ø., Gill, R. D., and Keiding, N. (1993), Statistical Models Based on Counting Processes, Springer, New York. Gill, R. D. and Johansen, S. (199), A Survey of Product-Integration with a View Toward Application in Survival Analysis, The Annals of Statistics, 18, 155 1555. Jin, Z., Lin, D. Y., Wei, L. J., and Ying, Z. (23), Rank-Based Inference for the Accelerated Failure Time Model, Biometrika, 9, 341 353. Lin, D. Y., Wei, L.-J., and Ying, Z. (1993), Checking the Cox Model with Cumulative Sums of Martingale-Based Residuals, Biometrika, 8, 557 572. Peng, L. and Huang, Y. (28), Survival Analysis with Quantile Regression Models, Journal of the American Statistical Association, 13, 637 649. 14

Römisch, W. (25), Delta Method, Infinite Dimensional, Encyclopedia of Statistical Sciences. van der Vaart, A. and Wellner, J. A. (2), Preservation theorems for Glivenko-Cantelli and uniform Glivenko-Cantelli classes, in High dimensional probability II, Springer, pp. 115 133. van der Vaart, A. W. and Wellner, J. A. (1996), Weak Convergence, Springer. 15