Cohort Sampling Schemes for the Mantel Haenszel Estimator

Size: px
Start display at page:

Download "Cohort Sampling Schemes for the Mantel Haenszel Estimator"

Transcription

1 doi:./j x Board of the Foundation of the Scandinavian Journal of Statistics 27. Published by Blackwell Publishing Ltd, 96 Garsington Road, Oxford OX4 2DQ, UK and 35 Main Street, Malden, MA 248, USA Vol 34: 37 54, 27 Cohort Sampling Schemes for the Mantel Haenszel Estimator LARRY GOLDSTEIN Department of Mathematics, University of Southern California BRYAN LANGHOLZ Department of Preventive Medicine, University of Southern California ABSTRACT. In many epidemiological studies, disease occurrences and their rates are naturally modelled by counting processes and their intensities, allowing an analysis based on martingale methods. Applied to the Mantel Haenszel estimator, these methods lend themselves to the analysis of general control selection sampling designs and the accommodation of time-varying exposures. Key words: baseline hazard estimation, case-control studies, counter matching, counting process, epidemiology, survival analysis. Introduction Mantel Haenszel estimators Mantel & Haenszel, 959; Breslow & Day, 98 have long been used in medical research to quantify the relative risk of disease between two groups. Historically, the long-standing appeal of the classical Mantel Haenszel estimators is that they have a simple closed form formula which does not require the solution of an estimating equation. Although maximuikelihood estimation has become the dominant approach to the analysis of epidemiological data, the Mantel Haenszel estimator continues to be popular. A medline search of papers in the years 2 25 gives a total of 42 references where Mantel Haenszel is cited in the abstract as the method applied. An excellent review of the development of the Mantel Haenszel estimator for analysis of epidemiological case-control studies, as well as the prominent role it has played in epidemiological research generally, is given in Breslow 996. In a cohort R = {,..., n} of individuals followed over the time interval [, τ], < τ, a model with minimal assumptions which relates failure and a binary exposure is that the failure rate for exposed individuals i R is increased by an unknown factor, over the failure rate for those unexposed. Allowing additional flexibility by leaving the common baseline hazard function λ t unspecified, and also letting the binary exposure indicator variable Z i t of individual i depend on time, taking the value or when i is unexposed or exposed at time t, respectively, results in the rate of observed failure at time t for i of λ i t = Y i tλ t Z it, where Y i t is the censoring indicator for individual i, taking the value when individual i is observed at time just prior to t. At any time t, we can divide the collection of individuals at risk at time t, Rt ={i : Y i t = } of size nt = Rt, into the two groups, R k t = {i Rt:Z i t = k} of sizes n k t = R k t, k =,. Turning for the moment to the fixed time covariate case, letting t, j < t 2, j < be the collection of all failure times among individuals having exposure j, with R jk = l n kt l, j /nt l, j, the Mantel Haenszel estimator is given explicitly by

2 38 L. Goldstein and B. Langholz Scand J Statist 34 ˆ MH = R R, 2 and in particular, there is no need to solve an estimating equation. In this setting, ˆ MH is a consistent and asymptotically normal estimate of Robins et al., 986. In this paper, we consider Mantel Haenszel estimators for nested case-control studies in which controls are sampled from risk sets determined by the cohort failure times see e.g. Langholz & Goldstein, 996. In recent work, Zhang et al. 2 defined generalized Mantel Haenszel estimators when controls are a simple random sample from the risk set and derived the properties of the estimator for right censored cohort data. Further, Zhang 2 developed estimators for a number of methods of sampling controls including sampling with and without replacement and geometric sampling, and showed their consistency. By placing our models in the counting process framework as detailed in sections 3 and 4., we expand on the work of these authors by providing Mantel Haenszel estimators for the entire class of control sampling methods considered by Borgan et al The estimators ˆ MH proposed for sampling, generalizing 2, continue to have a closed form representation which does not require solving a nonlinear estimating equation. Although our main focus is on the use of the Mantel Haenszel estimator under sampling, our setting also allows us to extend its scope to accommodate the time-varying intensities. In sections 4.2 and 4.3, we show the consistency and asymptotic normality of the Mantel Haenszel and baseline hazard estimator 4 under very general conditions, and in section 5 apply the asymptotic theory and determine the limiting distributions under random sampling, matching and counter matching, and make efficiency comparisons against ˆ MPL, the maximum partial likelihood estimator MPLE. It is well known that in the full-cohort setting ˆ MH performs asymptotically as well as ˆ MPL at the null =. In section 5, we show that ˆ MH continues to have this feature under sampling. 2. Risk set sampling 2.. General framework We consider a cohort with failures driven by the intensity, where the binary exposure status histories are not available for all cohort subjects but can be ascertained for a sample in a nested case-control study. In particular, if subject i fails at time t, we consider designs in which the case i is always included in the sample, and controls are chosen from Rt \{i}, those other individuals in the risk set at time t. Control sampling can be specified by giving for all t and i, r with i r and r R a collection of probabilities π t r i for choosing the individuals in the set r Rt to serve as controls should i fail at time t. Exposure status is then ascertained for the members of the set r so selected, the sampled risk set, made up of the controls r \{i} along with the failure i. For convenience, we set π t r i = when i r or Y i t =. The added flexibility and efficiency gains made possible by the choice of design π t r i is substantial, opening up the possibility of using sampling designs that can take advantage of additional structure which may be available in the data. For example, in designs 3 and 4, the matching and counter-matching designs described below, we assume that for each i Rt we have available the value C i t giving the strata membership of i among the possible values in C, some small finite set. For l C we let C l t ={i : Y i t =, C i t = l} and c l t = C l t, the l th sampling stratum, and its size, at time t. Each design π t r i has an associated probability distribution on the subsets of R defined by Board of the Foundation of the Scandinavian Journal of Statistics 27.

3 Scand J Statist 34 Sampling and Mantel Haenszel estimators 39 π t r = nt π t r i, 3 i r which sums to one by virtue of π t r i = π t r i = Y i t = nt. 4 i r i R, r i i R In addition, we define the associated weights w i t, r, set to when i is not at risk, by π w i t, r = t r i nt l r π tr l, so that π tr i = π t rw i t, r Some specific designs The sampling framework accommodates a wide range of designs e.g. Borgan et al., 995; Borgan & Langholz, 998; Andrieu et al., 2; Langholz & Goldstein, 2. Here, we highlight a few: Design : the full cohort. When information on all subjects is available, we may take π t r i to be the indicator of the set of those at risk at time t π t r i = r = Rt and so w i t, r = i Rt, r = Rt. The framework under which the classical Mantel Haenszel estimator can be applied is recovered under this scheme when the exposures are time fixed. When the collection of exposure status data on the full cohort is impractical and no additional information on cohort members is available, the simple random sampling design is a natural choice: Design 2: simple random sampling. At each failure time, a simple random sample of m individuals is chosen from those at risk to serve as controls for the failure; for i r Rt with r = m, nt π t r i =, 6 m and the probabilities 3 and weights 5 are given by nt π t r = and w i t, r = nt m m. The matching design could be used to control for confounding by stratifying by a potential confounder. Design 3: simple random sampling within matching strata, with specification m =,. If subject i fails at time t, then a simple random sample of m Ci t controls are chosen from C Ci tt, the failure s stratum at time t, to serve as controls for the failure. Hence, the sampling probabilities of this scheme are given by cci π t r i = tt r C Citt, r i, r = m Cit, 7 m Ci t and for r C l t, r = and i r, the probabilities 3 and weights 5 are given by π t r = c lt cl t and w i t, r = nt. nt Board of the Foundation of the Scandinavian Journal of Statistics 27.

4 4 L. Goldstein and B. Langholz Scand J Statist 34 In section 5, we show that significant efficiency gains over random sampling can be achieved by the counter-matching design when the strata C are sufficiently correlated to exposure. Design 4: counter matching, with specification m =,. If subject i fails at time t, then controls are randomly sampled without replacement from each C l t except for the failure s stratum, from which m Ci t controls are sampled. Let P C t denote the set of all subsets of Rt with individuals of type l for all l C. Then for r P C t and i r, the sampling probabilities of this scheme are given by [ ] cl t cci π t r i = tt, 8 m Ci t and the probabilities 3 and weights 5 are given by [ ] cl t π t r = and w i t, r = c Ci tt/m Ci t. 3. Mantel Haenszel estimators for sampled risk set data Let N i, r t be the counting process that records the number of times in, t] that i fails and r is chosen as its sampled risk set and define Nr k t = N i,r t, 9 N i,r t and N r t = i R k t i r recording, respectively, the number of times in, t] that r was chosen as the sampled risk set for a failure in R k t, that is, one having exposure k, and the total number of times in, t] that r was chosen as the sampled risk. Now let A k r t = π t r i = π t r w i t, r, k =, i R k t i R k t and A r t = A r t + A r t. By 3, for sets r with π t r = wehavea r t = ntπ t r and A k r t A r t and A k r t = nt., k {, } Now for j, k =, set t R jk t = A r sa k r sdnrs j and R jk = R jk τ. 2 The Mantel Haenszel estimator of in this more general context is then given by ˆ MH = R, R i.e. with the definitions of R jk extended as in 2, the estimator has exactly the same form 2 as before. We also consider the variance estimator ˆσ 2 = ˆ MH τ τ A 2 r sa r sa r sdn r s A r A s r sa 2. 3 r s A r s + ˆ dn MH A r s rs In theorems and 2 we give conditions under which ˆ MH is consistent and asymptotically normal, and show that nˆσ 2 is consistent for the variance of the asymptotic distribution. We note that for designs and 2, ˆ MH reduces to the classical Mantel Haenszel estimator and Board of the Foundation of the Scandinavian Journal of Statistics 27.

5 Scand J Statist 34 Sampling and Mantel Haenszel estimators 4 ˆσ 2 is the previously described conditional variance estimator Breslow, 98; Robins et al., 986. Where estimates of can be used to assess the magnitude of the effect that exposure has on failure, estimates of the integrated baseline hazard Λ t = t λ sds can in turn be used to provide estimates of absolute risk. We consider the integrated baseline hazard function estimate t dn ˆΛ n t, ˆ MH = r s i r ˆ, 4 Z is MH w i s, r given in terms of the weights defined in 5, where the ratio in the integral is regarded as if there is no one at risk. In theorem 3, we give conditions under which n ˆΛ n, ˆ MH Λ converges weakly as n to a mean-zero Gaussian process, and provide a uniformly consistent estimator for its variance function. 4. Properties of the Mantel Haenszel estimators 4.. The counting process model for sampling Much of the analysis here follows the work of Borgan et al. 995 closely, which is hereafter referred to as BGL. Assume that the censoring and failure information are defined on a probability space with a standard filtration F t, and that the censoring indicators Y i t, exposures Z i t, design π t r i and strata variables C i t are left continuous and adapted, and hence predictable and locally bounded. We make the assumption of independent sampling as in BGL that the intensity processes with respect to the filtration F t is the same as that with respect to this filtration augmented with the sampling information, that is, that selecting an individual as a control does not influence the likelihood of failure for that individual in the future. Combining with the design probabilities, into which censoring has already been incorporated, we see N i, r t has intensity of the form λ i,r t = Z it π t r iλ t, 5 and subtracting the integrated intensity results in M i, r t = N i, r t t λ i, r sds, orthogonal local square integrable martingales with predictable quadratic variation d M i, r t = λ i, r tdt. Similarly, with A k r t given in for k {, }, by linearity the counting processes Nr k t and N r t defined in 9 give rise to the orthogonal local square integrable martingales Mr k t = M i, r t M i, r t and M r t = i R k t i r with respective intensities λ k r t = k A k r tλ t and λ r t = A r t + A r t λ t, 6 and predictable quadratic variations Board of the Foundation of the Scandinavian Journal of Statistics 27.

6 42 L. Goldstein and B. Langholz Scand J Statist 34 d M k r t = λ k r tdt and d M r t = λ r tdt. 7 For v a multisubset of {, } of size 2 or 3, e.g. v = {,, }, define H v t = A v r t A k r t. 8 k v Then, for j, k {, }, the processes t W jk t = A r sa k r sdmrs j 9 are local square integrable martingales with predictable quadratic variation d W jk, W pq t = j = p j H jkqtλ tdt. 2 Using M j rt = N j rt λ j rt, 6 and 8, R jk t in 2 may be written R jk t = j t As H t = H t, H jk sλ sds + W jk t. 2 Gt = R t R t = W t W t 22 is a local square integrable martingale and, by 2, has quadratic variation d G t = 2 H t + H t λ tdt = A 2 r ta r ta r tλ r tdt Asymptotics of ˆ MH We prove the consistency and asymptotic normality of ˆ MH under some regularity and stability conditions. Condition. The cumulative hazard on the interval [, τ] is finite: Λ τ <. Condition 2. For all multisubsets v of {, } with v {2, 3}, there exist left continuous functions h v t such that for almost all t in [, τ], n H vt p h v t. Under conditions and 2 the dominated convergence theorem of Hjort & Pollard 993 as in proposition of BGL p. 762, with dominating functions D n t = Dt = by, yields t t n H vsλ sds p I v t where I v t = h v sλ sds. 23 Proposition Let conditions and 2 hold. Then for every t [, τ], n W jk, W pq t p j = p j I jkqt and n R jk t p j I jkt. 24 Proof. The first claim follows by 2 and 23. By 2, we have n R jkt = j t n H jksλ sds + n W jkt. Board of the Foundation of the Scandinavian Journal of Statistics 27.

7 Scand J Statist 34 Sampling and Mantel Haenszel estimators 43 By 23 the first term converges to j I jkt. The second term converges to zero in probability by 2 and a standard argument using Lenglart s inequality see Andersen et al., 993, and the second claim in 24 follows. We now show the consistency of ˆ MH upon additionally adopting Condition 3. I τ, given in 23, is strictly positive. Theorem Under conditions 3, the estimate ˆ MH defined in 2 is consistent, ˆ MH p as n. Proof. Using proposition, and that I = I,wehave ˆ MH = n R n R p I I =. Lemma Under conditions 3, the processes {n /2 W jk } given in 9 converge jointly in D[, τ] to mean-zero Gaussian processes {w jk } with covariations d w jk, w pq t = j = p j h jkqtλ tdt, and hence n /2 Gt = n /2 W t W t in 22 converges in D[, τ] to the mean-zero Gaussian process g with t g t = 2 h s + h s λ sds. Further, for t [, τ], and any consistent sequence ˆ n p,asn, t n ˆ n A 2 r s A r s A r sdn r s p g t. 25 Proof. We apply the martingale central limit theorem of Rebolledo, as presented in theorem II.5. of Andersen et al The processes {n /2 W jk } are local square integrable martingales, whose predictable quadratic variation converges by proposition to the continuous functions given in 24. Regarding the Lindeberg Condition, using for the two final inequalities, n A k 2 r t n /2 A k r t A r t A r t > ɛλj rtdt τ j A k 2 + δ r t ɛ δ n + δ/2 A r t A j rtλ tdt τ j ɛ δ n + δ/2 τ A j rtλ tdt j ɛ δ n δ/2 Λ τ p. For all t [, τ] and i, j, k {, }, by Rebolledo s theorem as in theorem II.5. of Andersen et al. 993, the scaled optional variation Board of the Foundation of the Scandinavian Journal of Statistics 27.

8 44 L. Goldstein and B. Langholz Scand J Statist 34 n [W ik, W ij ] t = n t A 2 r sa k r sa j rsdnrs i converges to the same limit 24, as that of the scaled predictable variation. The convergence in 25 now follows. Theorem 2 Under conditions 3, with ˆ MH given in 2, n ˆ MH d N, σ 2, 26 where σ 2 = τ 2 h t + h t λ tdt τ h tλ tdt, 27 2 which can be consistently estimated by nˆσ 2 given in 3. Proof. As n ˆ MH = n /2 R R = n /2 Gτ, n R n R 26 follows using proposition and lemma. The consistency of nˆσ 2 follows from 25 for the numerator, and the consistency of ˆ MH and 24 for the denominator. Under the null =, we note that as h t + h t = h t, the asymptotic variance 27 simplifies to σ 2 = τ h tλ tdt Properties of the baseline hazard estimator To study the estimate 4 we impose the following additional conditions. Condition 4.The ratio nt/n is uniformly bounded away from zero in probability as n. and Condition 5. There exist functions e and ψ such that for all t [, τ] asn, { } A r t π t r A r t + A p e r t, t, 29 n π t r 2 {A r t + A r t} p ψ, t. 3 Letting t < t 2 < be the collection of all failure times, and R j the sampled risk set at failure time t j, we rewrite the cumulative baseline hazard estimate 4 as ˆΛ n t, ˆ MH = tj t, i R j ˆ Z it j MH w i t j, R j where the weights w i t, r are given in 5. Board of the Foundation of the Scandinavian Journal of Statistics 27.

9 Scand J Statist 34 Sampling and Mantel Haenszel estimators 45 Theorem 3 Let conditions 5 hold, and with e, u as in 29 set Bt, = t e, uλ udu. Then n /2 ˆ MH and the process X n = n /2 ˆΛ n, ˆ MH Λ + n /2 ˆ MH B, are asymptotically independent. The limiting distribution of X n is, with ψ, t as in 3, that of a mean-zero Gaussian martingale with variance function ω 2 t, = t ψ, uλ udu. In particular, the scaled difference between the estimated and true integrated baseline hazard n ˆΛ n, ˆ MH Λ converges weakly as n to a mean-zero Gaussian process with covariance function σ 2 Λs, t = ω 2 s t + Bs, σ 2 Bt,. The function σ 2 Λ s, t can be estimated uniformly consistently by nˆσ2 Λs, t where ˆσ 2 Λs, t = ˆω 2 s t; ˆ MH + ˆB n s; ˆ MH ˆσ 2 ˆB n t; ˆ MH, ˆω 2 t; = t j t ˆB n t, = tj t { i Rj Z it j w i t j, R j } 2 and i R j Z i t j Z it j w i t j, R j { i Rj Z it j w i t j, R j } 2. Proof. The form of ˆΛ n is the same as in BGL, and noting in particular that Condition 4 in BGL can be satisfied by letting X r t = and Dt a constant, we have that X n is asymptotically equivalent to the local square integrable martingale, Y n = n /2 dm r u i r Z iu w i u, r, and the proof of the claims made of the asymptotic distribution of X n now follow as there. Regarding the asymptotic independence, for any locally bounded predictable processes H r, it is straightforward to verify W W, H r dm r t =. Hence, by the asymptotic joint normality provided by Rebolledo s theorem II.5. in Andersen et al. 993, functions of the collections { H r dm r } r and W W, in particular n ˆ MH and X n, are asymptotically independent. The claim that σ 2 Λ s, t can be estimated uniformly consistent by nˆσ2 Λs, t follows as in BGL, based on the fact that n ˆω 2 t, is the optional variation process of the local square integrable martingale Y n, which by Rebolledo s theorem as cited above, converges uniformly in probability to its predictable variation ω 2 t, ; the uniform convergence of ˆB n, ˆ n tob, is as in BGL, proposition 2. Board of the Foundation of the Scandinavian Journal of Statistics 27.

10 46 L. Goldstein and B. Langholz Scand J Statist Applications In this section, we first show that for any design that meets the conditions in section 4.2, the Mantel Haenszel estimator and the MPLE have the same asymptotic variance at the null, and away from the null that the MPL estimator is at least as efficient as the Mantel Haenszel estimator. We then apply our results to the designs discussed in section 2.2. Although our asymptotic results hold under the weaker stability conditions of sections 4.2 and 4.3, here we assume that the censoring, covariate and strata variables are independent and identically distributed copies of Y t, Zt, and Ct, respectively, left continuous and adapted processes having right hand limits. The strata variable required for designs 3 and 4 gives the type of individual among the possible values in a small finite set C; the strata variable may be used to model any additional information, a surrogate of exposure in particular. 5.. Relative efficiency of the Mantel Haenszel estimator to the MPLE We now show that at the null =, MH and MPL have equal efficiency. For r R let S r, t = A r t + A r t, S r, t = A r t and E r, t = S r, t S r, t. 3 Referring now to 3.4 of BGL where β = there corresponds to = here, we see S r 2, t = A r t asz 2 = Z when Z {, }. In the null case, using 3. of BGL, the inverse variance of the MPLE is the integral of the baseline hazard against the limit of S2 r, t n S r, t S r 2, t S S r r t, t = A r t n A r t + A r t A 2 r t [A A r t + A r t + A r t] r t = A r t[a r t + A r t] A r t 2 n A r t + A r t A r t + A r t = A r ta r t n A r t + A r t = h n,t p h t, yielding agreement with 28. Hence, the asymptotic variances of the MPLE and of the Mantel Haenszel estimator, at the null, are equal for sampling in general. To characterize the relative efficiency away from the null =, Anderson & Bernstein 985 showed that in the full-cohort situation ˆ MH is one Newton step away from =. It is easily shown that this result holds for ˆ MH under risk set sampling. Although estimators which are one Newton step away from an n-consistent estimator can be asymptotically efficient see e.g. theorem 4.3 of Lehmann & Casella, 998, we expect away from the null that ˆ MH, one step away from the inconsistent estimator =, will generally be less efficient than ˆ MPL Relative efficiency for specific designs For each of the designs 4, we verify that conditions 5 are satisfied and determine the standardized asymptotic distributions of ˆ MH and ˆΛ n. We assume that τ < ; asλ is already assumed bounded away from infinity, the finite interval Condition holds. For designs and 2, to satisfy condition 3, we assume that Board of the Foundation of the Scandinavian Journal of Statistics 27.

11 Scand J Statist 34 Sampling and Mantel Haenszel estimators 47 f k t = PZt = k Y t = for k =,, are bounded away from over some non-trivial interval of time [a, b] [, τ]. For design 3, to satisfy condition 3, letting for k =, and l C, q l t = PCt = l Y t = and f k, l t = PZt = k Ct = l, Y t =, we assume there exists l C with 2 such that over some non-trivial interval [a, b] [, τ] the functions q l t and f k, l t are bounded away from zero. That is, there is some strata in which a comparison of individuals can be made, and in that strata, the covariate value is not a constant. For design 4, to satisfy condition 3 we assume that either: i the assumption for design 3 holds, or ii there exists an unequal pair l, l 2 C with q l t, q l2 t, f j, l t, f k, l2 t bounded away from zero. That is, we need to assume either that a meaningful comparison can be drawn: i within a strata or ii between two different strata. Condition 4 is satisfied when τ < assuming that inf pt >, where pt = PY t = ; t [, τ] one needs only to invoke the strong law of large numbers in D[, ] of Rao 963 after reversing the time axis, similar to BGL. In summary, in each of the examples which follow, we need to verify only conditions 2, 3 and 5. Throughout we continue to let nt = Rt, and define ρ n t = nt/n. Design : full cohort. Sampling all individuals who are at risk at the time of failure gives π t r i = r = Rt, and with n k t = R k t, A k r t = π t r i = n k tr = Rt, i R k t and A Rt t = nt. Using 8, n H vt = A v r t A k r t = ρ n n t n k t nt p pt f k t = h v t; k v k v k v hence condition 2 is satisfied. Using that λ is bounded away from zero, Condition 3 is satisfied as f t and f t are assumed bounded away from zero over some interval. By 27 the variance of the limiting normal is τ 2 σ 2 f t + f t f tf tptλ tdt = τ f tf tptλ tdt. 2 Condition 5 can be seen to be satisfied by f e, t = t and ψ f t + f t, t = f t + f t. Specializing further to the null case =, σ 2 = τ ptf tf tλ tdt, e, t = f t and ψ, t =. 32 Design 2: simple random sampling. Letting r k t = {i r, Z i t = k} and r k t = r k t, the sampling probabilities 6 yield that for r Rt with r = m, nt nt π t r =, A k r t = r k t, A r t = nt m m m m and so Board of the Foundation of the Scandinavian Journal of Statistics 27.

12 48 L. Goldstein and B. Langholz Scand J Statist 34 where h n,v t = n = nt nm v a v v t A k r t = nm v k v nt m r = m, t k v nt m r = m, t k v r k t = ρ nt E X m v k t, k v r k t X t, X t Hn t, n t, m, 33 the hypergeometric number of type and items in a simple random sample of m items from a population with n t and n t items of type and respectively. Taking limits for j, k distinct for v = 2 m h jk t = ptf j tf k t, m while for v = 3, h jjk t = pt m2 f m 3 j tf k t +m 3 fj 2 tf k t, satisfying Condition 2. Condition 3 is verified here as it was for design. By 27, the variance of the asymptotic distribution is σ 2 = τ ptf tf t [ + +f t + f tm 2 ] λ tdt m τ ptf tf tλ tdt, 34 2 and condition 5 is satisfied with e, t = x m f x x x + x = m + x x, x tf x t and ψ, t = m m f x pt x + x x, x tf x t. x + x = m Under the null =, expression 34 simplifies to m σ 2 = m τ ptf tf tλ tdt, giving an asymptotic relative efficiency of m /m with respect to the full-cohort variance 32, the same relative efficiency as ˆ MPL, as expected by the computation at the end of section 4.2. Lastly, in the null case e, t = f t and ψ, t = pt. Previous efficiency work used a recursive representation of the factorial moments of the extended hypergeometric distribution Harkness, 965 to derive an asymptotic variance expression for small strata case-control data Breslow, 98; Hauck & Donner, 988. The expressions derived in these papers when there is a single case per set correspond to 34, a simplification that has not been previously described. Figure shows efficiency curves relative to ˆ MPL as a function of log by m when f t.2. As noted previously in Breslow 98, the Mantel Haenszel estimator has high efficiency relative to ˆ MPL over a fairly large region around the null. For the next two designs, define for r Rt and let r k, l t = r R k t C l t, r k, l t = r k, l t, n k, l t = R k t C l t, Board of the Foundation of the Scandinavian Journal of Statistics 27.

13 Scand J Statist 34 Sampling and Mantel Haenszel estimators 49 X l t Hn, l t, n, l t,, for l C 35 be independent multivariate hypergeometric vectors, as in 33. Design 3: matching. Recalling the sampling probabilities 7 for the matching design, for r C l t with r =,wehave A k cl t c r t = l t r k, l t and A r t = c l t Hence, h n,v t = a r v t A k r t = n n k v cl t = nt n = ρ n t c l t nt c l t nt E k v cl t r C l t, r = a v r C l t, r = k v m l X k, l t. r t k v A k r t m l r k, l t with X l t as in 35. For j = k distinct, taking limits we find h jk t = pt ml q l tf j, l tf k, l t, 36 while for v = 3, h jjk t = pt ml q l t f ml 2 j, l tf k, l t + m l 2 f 2 ml 2 j, ltf k, l t and condition 2 is satisfied. Condition 3 is satisfied in a manner similarly as for design 2, with the additional assumption that 2, ensuring that / in 36 is positive. Fig.. Asymptotic efficiency by exposure rate ratio of Mantel Haenszel relative to the partial likelihood estimator for simple random sampling of m controls. Probability of exposure PZt = Y t = =.2. Board of the Foundation of the Scandinavian Journal of Statistics 27.

14 5 L. Goldstein and B. Langholz Scand J Statist 34 In particular, from 27 the variance of the limiting normal is τ pt ml σ 2 q m l = 2 l tf, l tf, l t[ + +f, l t + f, l t 2]λ tdt τ pt 2, ml q tf, l tf, l tλ tdt l and condition 5 is satisfied with e, t = x, l q l t x, l + x, l and x, l + x, l = ψ, t = pt q l t x, l + x, l = Specializing further, under the null =, ml x, l, x, l x, l + x, l σ 2 = τ pt, ml q tf, l tf, l tλ tdt l f x, l, l tf x, l, l t ml x, l, x, l f x, l, l tf x, l, l t. e, t = q l tf, l t and ψ, t = pt. Design 4: counter matching. Recalling the sampling probabilities 8 for the countermatching design, where C is a set of types, P C t Rt the collection of sets r with subjects of type l at time t, c l t is the number of type l subjects in Rt, and C i t the type of subject i at time t. By 3 for r P C t, [ ] cl t π t r =, and letting r k, l t = {i r : Z i t = k, C i t = l}, and r k, l t = r k, l t, A k r t = i R k t [ ] cl t π t r i = r k, l t c lt, [ and A r t = ] cl t. nt As for design 3, we can write h n,v t as an expectation h n,v t = ρ n te k v X k, l tc l t nt = ρ n te l p C, p =,..., v k v X k, lp tc lp t, p nt and for v = 2 with j = k distinct, taking limits we find h jk t ispt times ml f j, l tf k, l tql 2 t + f j, l tf k, l2 tq l tq l2 t, 37 l l2 which can be further simplified to yield Board of the Foundation of the Scandinavian Journal of Statistics 27.

15 Scand J Statist 34 Sampling and Mantel Haenszel estimators 5 h jk t = pt f j tf k t f j, l tf k, l tql 2 t. 38 Applying the assumptions made at the beginning of this section in version i on the first sum in 37 or in version ii on the second sum in 37, condition 3 is satisfied. Similar calculations yield { h jkk t = pt f j tfk 2 t + f j, l tf, k tql 2 t 3f k, l tq l t } f m 2 j, l tf k,l t 2f k,l tql 3 t. l Hence, condition 2 is satisfied, and σ 2 can now be calculated by 27. For the parameters in the limiting distribution for the baseline hazard estimator, we have e, t = x, l q l t m l mα x, α + x, α = m α, α C x f x, l + x,l q l t α tf x α t, x, α, x, α α C and ψ, t = pt x, α + x, α = m α, α C x, l + x,l q l t α C mα f x α tf x α t. x, α, x, α We specialize further to the case where there are two strata, C = 2, and the binary strata variable Ct {, } is a perhaps easily available surrogate for the true binary exposure Zt {, }. Recalling f k,l t = PZt = k Ct = l, Y t = k, l {, }, we have f k,l tq l t = PZt = k Ct = l, Y t = PCt = l Y t = = PZt = k, Ct = l Y t = = π k,l t say, and δt = PCt = Zt =, Y t = and γt = PCt = Zt =, Y t =, the sensitivity and specificity of Zt forct. As π t = δtf t, π t = δtf t π t = γtf t, π t = γtf t, and 38 gives h t for, say m = m =, as pt times f tf t f, tf, tqt 2 + f, tf, tqt 2 = f tf t π, tπ, t + π, tπ, t = f tf t γtf tδtf t + γtf t δtf t = f tf t δt γt + γtδt. 39 In a similar way, h t and h t can be expressed in terms of the sensitivity, specificity and probability of exposure integrated against the baseline hazard. Using 26 and the partial likelihood variance given in A3 from Langholz & Borgan 995, asymptotic efficiencies for ˆ MH relative to ˆ MPL can be computed. Figure 2 shows the asymptotic relative efficiencies by log with PZt = Y t = =.2 for m, m {, 2} when the conditional distribution of Zt, Ct given Y t = does not Board of the Foundation of the Scandinavian Journal of Statistics 27.

16 52 L. Goldstein and B. Langholz Scand J Statist 34 depend on t, which holds, approximately, for rare outcomes when censoring does not depend on Zt, Ct. Although there is some difference in the relative efficiencies by choice of m and m and the sensitivity and specificity of C for Z, ˆ MH has fairly high efficiency in a wide range of situations. Under the null = 27 simplifies to yield σ 2 = τ pt f tf t, 4 l f m, l tf, l tq 2 l l t λ t and using x, l + x, l = and EX k, l = f k, l t, we have e, t = f, l tq l t and ψ, t = pt. When m, m =,, so that the design matches one control with surrogate exposure Ct value opposite to the exposure Zt of the case, substituting 39 into 4 yields τ σ 2 = ptf tf t δt γt + γtδtλ tdt, which is the equal to the asymptotic variance for the : counter-matching design when using the ˆ MPL Langholz & Clayton, 994, as anticipated by the argument at the end of section 4.2. We note that, as in Langholz & Clayton 994, when the sensitivity and specificity are close to or, the counter-matching design has efficiency close to that of the full cohort. Fig. 2. Asymptotic efficiency by exposure rate ratio of Mantel Haenszel relative to the partial likelihood estimator for counter matching by sensitivity δ = PZt = Ct =, Y t = and specificity γ = PZt = Ct =, Y t =. Probability of exposure PZt = Y t = =.2. Board of the Foundation of the Scandinavian Journal of Statistics 27.

17 Scand J Statist 34 Sampling and Mantel Haenszel estimators Discussion We have described the Mantel Haenszel estimator for the rate ratio in a proportional hazards model, and associated baseline hazard estimator, for a large class of case-control sampling designs within the context of risk set sampling from cohort data. We have further demonstrated necessary conditions for consistency and asymptotic normality as well as provided efficiency calculations for a number of designs. Our results generalize the work of Zhang et al. 2 and Zhang 2 by allowing for quite general censoring and sampling, recurrent events, and time-dependent exposure variable. Although, in general, ˆ MH is less efficient than ˆ MPL, we showed that for general sampling, when =, ˆ MH has efficiency equal to ˆ MPL. Further, for the specific sampling designs we studied, we found that the efficiency loss for ˆ MH was not large within a range of of practical interest. Like other Mantel Haenszel estimators and unlike ˆ MPL, ˆ MH under sampling has a simple closed form. We have described and analysed a number of extensions to the Mantel Haenszel estimator in Goldstein and Langholz 26 including handling multilevel exposure, estimators based on t R jk t = aa r s, A r s A k r sdnrs, j where au, v is a positive symmetric function along the lines of Zhang et al. 2 and Zhang 2, and robust variance estimators based on the optional variation parallel to the estimator described by Liang 985. An important generalization relevant to the matching design is to the stratified proportional hazards model with λ i t = Y i tλ Ci tt Z it, where λ c t is the baseline hazard function for matching stratum c e.g. Andersen et al., 993. Even in this extended model, it remains true that R t R t is a local square integrable martingale. We can guarantee the consistency of ˆ MH in this situation by letting condition hold with λ l t replacing λ t, and condition 2 hold with H v, l t = r C l t a v r t k v A k r t and its scaled limit h v, l t, replacing H v t and h v t, respectively, for each l C. Now lemma holds for each l C, and summing over all l in the finite set C gives the asymptotic normality of ˆ MH for the matching design with strata-specific baseline hazard λ l t, with variance σ 2 = τ Acknowledgement 2 h, l t + h, l tλ l tdt τ h., ltλ l tdt 2 This work was supported by United States National Cancer Institute grant CA489. References Anderson, J. R. & Bernstein, L Asymptotically efficient two-step estimators of the hazards ratio for follow-up studies and survival data. Biometrics 4, Andersen, P. K., Borgan, Ø., Gill, R. D. & Keiding, N Statistical models based on counting processes. Springer Verlag, New York. Andrieu, N., Goldstein, A. M., Thomas, D. C. & Langholz, B. 2. Counter-matching in geneenvironment interaction studies: Efficiency and feasibility. Am. J. Epidemiol. 53, Borgan, Ø. & Langholz, B Risk set sampling designs for proportional hazards models. In statistical analysis of medical data: new developments eds B. S. Everitt & G. Dunn, 75. Arnold, London. Board of the Foundation of the Scandinavian Journal of Statistics 27.

18 54 L. Goldstein and B. Langholz Scand J Statist 34 Borgan, Ø., Goldstein, L. & Langholz, B Methods for the analysis of sampled cohort data in the Cox proportional hazards model. Ann. Statist. 23, Breslow, N. E. 98. Odds ratio estimators when the data are sparse. Biometrika 68, Breslow, N. E Statistics in epidemiology: The case-control study. Journal of the American Statistical Association 9, Breslow, N. E. & Day, N. E. 98. The design and analysis of case-control studies. IARC Scientific Publications, Lyon. Goldstein, L. & Langholz, B. 26. Cohort sampling schemes for the Mantel Haenszel estimator: extensions to multilevel covariates, stratified models, and robust variance estimators. abs/math.st/69333 accessed September 2, 26.. Harkness, W Properties of the extended hypergeometric distribution. Ann. Math. Statist. 36, Hauck, W. W. & Donner, A The asymptotic relative efficiency of the Mantel Haenszel estimator in the increasing-number-of-strata case. Biometrics 44, Hjort, N. L. & Pollard, D Asymptotics of minimizers of convex processes. Statistical Research Report 5/93, Institute of Mathematics, University of Oslo, Oslo. Langholz, B. & Borgan, Ø Counter-matching: a stratified nested case-control-sampling method. Biometrika 82, Langholz, B. & Clayton, D Sampling strategies in nested case-control studies. Environ. Health Perspect. 2Suppl. 8, Langholz, B. & Goldstein, L Risk set sampling in epidemiologic cohort studies. Statistical Science, Langholz, B. & Goldstein, L. 2. Conditional logistic analysis of case-control studies with complex sampling. Biostatistics 2, Lehmann, E. & Casella, G Theory of point estimation, 2nd edn. Springer, New York. Liang, K.-Y Odds ratio inference with dependent data. Biometrika 72, Mantel, N. & Haenszel, W Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 22, Rao, R. R The law of large numbers for D[, ]-valued random variables. Theory Probab. Appl. 8, Robins, J. M., Gail, M. H. & Lubin, J. H More on Biased selection of controls for case-control analysis of cohort studies. Biometrics 42, Zhang, Z.-Z. 2. On consistency of Mantel Haenszel type estimators in nested case-control studies. J. Jpn Statist. Soc. 3, Zhang, Z.-Z., Fujii, Y. & Yanagawa, T. 2. On Mantel Haenszel type estimators in simple nested case-control studies. Commun. Statist. A Theory Methods 29, Received October 25, in final form September 26 Larry Goldstein, USC Department of Mathematics, KAP-8, Los Angeles, CA , USA. larry@math.usc.edu Board of the Foundation of the Scandinavian Journal of Statistics 27.

The Ef ciency of Simple and Countermatched Nested Case-control Sampling

The Ef ciency of Simple and Countermatched Nested Case-control Sampling Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA Vol 26: 493±509, 1999 The Ef ciency of Simple and Countermatched Nested Case-control

More information

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA Ørnulf Borgan Bryan Langholz Abstract Standard use of Cox s regression model and other relative risk regression models for

More information

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Missing covariate data in matched case-control studies: Do the usual paradigms apply? Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

More information

STAT Sample Problem: General Asymptotic Results

STAT Sample Problem: General Asymptotic Results STAT331 1-Sample Problem: General Asymptotic Results In this unit we will consider the 1-sample problem and prove the consistency and asymptotic normality of the Nelson-Aalen estimator of the cumulative

More information

On the Breslow estimator

On the Breslow estimator Lifetime Data Anal (27) 13:471 48 DOI 1.17/s1985-7-948-y On the Breslow estimator D. Y. Lin Received: 5 April 27 / Accepted: 16 July 27 / Published online: 2 September 27 Springer Science+Business Media,

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information

Linear rank statistics

Linear rank statistics Linear rank statistics Comparison of two groups. Consider the failure time T ij of j-th subject in the i-th group for i = 1 or ; the first group is often called control, and the second treatment. Let n

More information

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin FITTING COX'S PROPORTIONAL HAZARDS MODEL USING GROUPED SURVIVAL DATA Ian W. McKeague and Mei-Jie Zhang Florida State University and Medical College of Wisconsin Cox's proportional hazard model is often

More information

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding

More information

Stratified Nested Case-Control Sampling in the Cox Regression Model

Stratified Nested Case-Control Sampling in the Cox Regression Model Stratified Nested Case-Control Sampling in the Cox Regression Model Bryan Langholz Department of Preventive Medicine, University of Southern California, School of Medicine, 2025 Zonal Ave, Los Angeles,

More information

Attributable Risk Function in the Proportional Hazards Model

Attributable Risk Function in the Proportional Hazards Model UW Biostatistics Working Paper Series 5-31-2005 Attributable Risk Function in the Proportional Hazards Model Ying Qing Chen Fred Hutchinson Cancer Research Center, yqchen@u.washington.edu Chengcheng Hu

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

STAT 331. Martingale Central Limit Theorem and Related Results

STAT 331. Martingale Central Limit Theorem and Related Results STAT 331 Martingale Central Limit Theorem and Related Results In this unit we discuss a version of the martingale central limit theorem, which states that under certain conditions, a sum of orthogonal

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Overview of today s class Kaplan-Meier Curve

More information

Additive and multiplicative models for the joint effect of two risk factors

Additive and multiplicative models for the joint effect of two risk factors Biostatistics (2005), 6, 1,pp. 1 9 doi: 10.1093/biostatistics/kxh024 Additive and multiplicative models for the joint effect of two risk factors A. BERRINGTON DE GONZÁLEZ Cancer Research UK Epidemiology

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Asymptotic equivalence of paired Hotelling test and conditional logistic regression Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Comparing Distribution Functions via Empirical Likelihood

Comparing Distribution Functions via Empirical Likelihood Georgia State University ScholarWorks @ Georgia State University Mathematics and Statistics Faculty Publications Department of Mathematics and Statistics 25 Comparing Distribution Functions via Empirical

More information

Product-limit estimators of the survival function with left or right censored data

Product-limit estimators of the survival function with left or right censored data Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut

More information

Maximum likelihood estimation for Cox s regression model under nested case-control sampling

Maximum likelihood estimation for Cox s regression model under nested case-control sampling Biostatistics (2004), 5, 2,pp. 193 206 Printed in Great Britain Maximum likelihood estimation for Cox s regression model under nested case-control sampling THOMAS H. SCHEIKE Department of Biostatistics,

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca February 3, 2015 21-1 Time matching/risk set sampling/incidence density sampling/nested design 21-2 21-3

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Multivariate Survival Data With Censoring.

Multivariate Survival Data With Censoring. 1 Multivariate Survival Data With Censoring. Shulamith Gross and Catherine Huber-Carol Baruch College of the City University of New York, Dept of Statistics and CIS, Box 11-220, 1 Baruch way, 10010 NY.

More information

Cox Regression in Nested Case Control Studies with Auxiliary Covariates

Cox Regression in Nested Case Control Studies with Auxiliary Covariates Biometrics DOI: 1.1111/j.1541-42.29.1277.x Cox Regression in Nested Case Control Studies with Auxiliary Covariates Mengling Liu, 1, Wenbin Lu, 2 and Chi-hong Tseng 3 1 Division of Biostatistics, School

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Causal exposure effect on a time-to-event response using an IV.

Causal exposure effect on a time-to-event response using an IV. Faculty of Health Sciences Causal exposure effect on a time-to-event response using an IV. Torben Martinussen 1 Stijn Vansteelandt 2 Eric Tchetgen 3 1 Department of Biostatistics University of Copenhagen

More information

On a connection between the Bradley-Terry model and the Cox proportional hazards model

On a connection between the Bradley-Terry model and the Cox proportional hazards model On a connection between the Bradley-Terry model and the Cox proportional hazards model Yuhua Su and Mai Zhou Department of Statistics University of Kentucky Lexington, KY 40506-0027, U.S.A. SUMMARY This

More information

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data 1 Part III. Hypothesis Testing III.1. Log-rank Test for Right-censored Failure Time Data Consider a survival study consisting of n independent subjects from p different populations with survival functions

More information

Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data

Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data Efficiency Comparison Between Mean and Log-rank Tests for Recurrent Event Time Data Wenbin Lu Department of Statistics, North Carolina State University, Raleigh, NC 27695 Email: lu@stat.ncsu.edu Summary.

More information

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Estimating Load-Sharing Properties in a Dynamic Reliability System. Paul Kvam, Georgia Tech Edsel A. Peña, University of South Carolina

Estimating Load-Sharing Properties in a Dynamic Reliability System. Paul Kvam, Georgia Tech Edsel A. Peña, University of South Carolina Estimating Load-Sharing Properties in a Dynamic Reliability System Paul Kvam, Georgia Tech Edsel A. Peña, University of South Carolina Modeling Dependence Between Components Most reliability methods are

More information

Testing Non-Linear Ordinal Responses in L2 K Tables

Testing Non-Linear Ordinal Responses in L2 K Tables RUHUNA JOURNA OF SCIENCE Vol. 2, September 2007, pp. 18 29 http://www.ruh.ac.lk/rjs/ ISSN 1800-279X 2007 Faculty of Science University of Ruhuna. Testing Non-inear Ordinal Responses in 2 Tables eslie Jayasekara

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Frailty Models and Copulas: Similarities and Differences

Frailty Models and Copulas: Similarities and Differences Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt

More information

Part IV Statistics in Epidemiology

Part IV Statistics in Epidemiology Part IV Statistics in Epidemiology There are many good statistical textbooks on the market, and we refer readers to some of these textbooks when they need statistical techniques to analyze data or to interpret

More information

Philosophy and Features of the mstate package

Philosophy and Features of the mstate package Introduction Mathematical theory Practice Discussion Philosophy and Features of the mstate package Liesbeth de Wreede, Hein Putter Department of Medical Statistics and Bioinformatics Leiden University

More information

Equivalence of random-effects and conditional likelihoods for matched case-control studies

Equivalence of random-effects and conditional likelihoods for matched case-control studies Equivalence of random-effects and conditional likelihoods for matched case-control studies Ken Rice MRC Biostatistics Unit, Cambridge, UK January 8 th 4 Motivation Study of genetic c-erbb- exposure and

More information

rnulf Borgan University of Oslo, N-0316 Oslo, Norway Bryan Langholz Department of Preventive Medicine, Abstract

rnulf Borgan University of Oslo, N-0316 Oslo, Norway Bryan Langholz Department of Preventive Medicine, Abstract Estimation of excess risk from case-control data using Aalen's linear regression model rnulf Borgan Institute of Mathematics, P.O. Box 153 Blindern, University of Oslo, N-316 Oslo, Norway Bryan Langholz

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data

More information

STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model

STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model STAT331 Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model Because of Theorem 2.5.1 in Fleming and Harrington, see Unit 11: For counting process martingales with

More information

Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

More information

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t) PhD course in Advanced survival analysis. (ABGK, sect. V.1.1) One-sample tests. Counting process N(t) Non-parametric hypothesis tests. Parametric models. Intensity process λ(t) = α(t)y (t) satisfying Aalen

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Methodology Marginal versus conditional causal effects Kazem Mohammad 1, Seyed Saeed Hashemi-Nazari 2, Nasrin Mansournia 3, Mohammad Ali Mansournia 1* 1 Department

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

A Regression Model For Recurrent Events With Distribution Free Correlation Structure A Regression Model For Recurrent Events With Distribution Free Correlation Structure J. Pénichoux(1), A. Latouche(2), T. Moreau(1) (1) INSERM U780 (2) Université de Versailles, EA2506 ISCB - 2009 - Prague

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased

More information

Analysis of recurrent event data under the case-crossover design. with applications to elderly falls

Analysis of recurrent event data under the case-crossover design. with applications to elderly falls STATISTICS IN MEDICINE Statist. Med. 2007; 00:1 22 [Version: 2002/09/18 v1.11] Analysis of recurrent event data under the case-crossover design with applications to elderly falls Xianghua Luo 1,, and Gary

More information

Additive hazards regression for case-cohort studies

Additive hazards regression for case-cohort studies Biometrika (2), 87, 1, pp. 73 87 2 Biometrika Trust Printed in Great Britain Additive hazards regression for case-cohort studies BY MICAL KULIC Department of Probability and Statistics, Charles University,

More information

Understanding Ding s Apparent Paradox

Understanding Ding s Apparent Paradox Submitted to Statistical Science Understanding Ding s Apparent Paradox Peter M. Aronow and Molly R. Offer-Westort Yale University 1. INTRODUCTION We are grateful for the opportunity to comment on A Paradox

More information

Describing Stratified Multiple Responses for Sparse Data

Describing Stratified Multiple Responses for Sparse Data Describing Stratified Multiple Responses for Sparse Data Ivy Liu School of Mathematical and Computing Sciences Victoria University Wellington, New Zealand June 28, 2004 SUMMARY Surveys often contain qualitative

More information

On consistency of Kendall s tau under censoring

On consistency of Kendall s tau under censoring Biometria (28), 95, 4,pp. 997 11 C 28 Biometria Trust Printed in Great Britain doi: 1.193/biomet/asn37 Advance Access publication 17 September 28 On consistency of Kendall s tau under censoring BY DAVID

More information

Regression models for multivariate ordered responses via the Plackett distribution

Regression models for multivariate ordered responses via the Plackett distribution Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento

More information

Goodness-of-Fit Tests With Right-Censored Data by Edsel A. Pe~na Department of Statistics University of South Carolina Colloquium Talk August 31, 2 Research supported by an NIH Grant 1 1. Practical Problem

More information

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen Outline Cox s proportional hazards model. Goodness-of-fit tools More flexible models R-package timereg Forthcoming book, Martinussen and Scheike. 2/38 University of Copenhagen http://www.biostat.ku.dk

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter

More information

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction Outline CHL 5225H Advanced Statistical Methods for Clinical Trials: Survival Analysis Prof. Kevin E. Thorpe Defining Survival Data Mathematical Definitions Non-parametric Estimates of Survival Comparing

More information

ST745: Survival Analysis: Nonparametric methods

ST745: Survival Analysis: Nonparametric methods ST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University February 5, 2015 The KM estimator is used ubiquitously in medical studies to estimate

More information

Information and asymptotic efficiency of the case-cohort sampling design in Cox s regression model

Information and asymptotic efficiency of the case-cohort sampling design in Cox s regression model Information and asymptotic efficiency of the case-cohort sampling design in Cox s regression model By: Haimeng Zhang, Larry Goldstein Zhang, H. and Goldstein, L. (2003). Information and asymptotic efficiency

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

1. Stochastic Processes and filtrations

1. Stochastic Processes and filtrations 1. Stochastic Processes and 1. Stoch. pr., A stochastic process (X t ) t T is a collection of random variables on (Ω, F) with values in a measurable space (S, S), i.e., for all t, In our case X t : Ω S

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Biometrical Journal 42 (2000) 1, 59±69 Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Kung-Jong Lui

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

9 Estimating the Underlying Survival Distribution for a

9 Estimating the Underlying Survival Distribution for a 9 Estimating the Underlying Survival Distribution for a Proportional Hazards Model So far the focus has been on the regression parameters in the proportional hazards model. These parameters describe the

More information

Likelihood ratio confidence bands in nonparametric regression with censored data

Likelihood ratio confidence bands in nonparametric regression with censored data Likelihood ratio confidence bands in nonparametric regression with censored data Gang Li University of California at Los Angeles Department of Biostatistics Ingrid Van Keilegom Eindhoven University of

More information

Tests for Two Correlated Proportions in a Matched Case- Control Design

Tests for Two Correlated Proportions in a Matched Case- Control Design Chapter 155 Tests for Two Correlated Proportions in a Matched Case- Control Design Introduction A 2-by-M case-control study investigates a risk factor relevant to the development of a disease. A population

More information

On a connection between the Bradley Terry model and the Cox proportional hazards model

On a connection between the Bradley Terry model and the Cox proportional hazards model Statistics & Probability Letters 76 (2006) 698 702 www.elsevier.com/locate/stapro On a connection between the Bradley Terry model and the Cox proportional hazards model Yuhua Su, Mai Zhou Department of

More information

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, ) Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, 302-308) Consider data in which multiple outcomes are collected for

More information

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data Biometrika (28), 95, 4,pp. 947 96 C 28 Biometrika Trust Printed in Great Britain doi: 1.193/biomet/asn49 Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival

More information

Universal Optimality of Balanced Uniform Crossover Designs

Universal Optimality of Balanced Uniform Crossover Designs Universal Optimality of Balanced Uniform Crossover Designs A.S. Hedayat and Min Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Abstract Kunert (1984)

More information

Estimating direct effects in cohort and case-control studies

Estimating direct effects in cohort and case-control studies Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research

More information

Package Rsurrogate. October 20, 2016

Package Rsurrogate. October 20, 2016 Type Package Package Rsurrogate October 20, 2016 Title Robust Estimation of the Proportion of Treatment Effect Explained by Surrogate Marker Information Version 2.0 Date 2016-10-19 Author Layla Parast

More information

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE ORIGINAL ARTICLE Marginal Structural Models as a Tool for Standardization Tosiya Sato and Yutaka Matsuyama Abstract: In this article, we show the general relation between standardization methods and marginal

More information

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan Outline: Cox regression part 2 Ørnulf Borgan Department of Mathematics University of Oslo Recapitulation Estimation of cumulative hazards and survival probabilites Assumptions for Cox regression and check

More information