Composite likelihood and two-stage estimation in family studies

Size: px
Start display at page:

Download "Composite likelihood and two-stage estimation in family studies"

Transcription

1 Biostatistics (2004), 5, 1,pp Printed in Great Britain Composite likelihood and two-stage estimation in family studies ELISABETH WREFORD ANDERSEN The Danish Epidemiology Science Centre, Statens Serum Institut, Artillerivej 5, 2300 Copenhagen S, Denmark and Department of Biostatistics, University of Copenhagen, Blegdamsvej 3, 2200 Copenhagen N, Denmark SUMMARY In this paper register based family studies provide the motivation for linking a two-stage estimation procedure in copula models for multivariate failure time data with a composite likelihood approach. The asymptotic properties of the estimators in both parametric and semi-parametric models are derived, combining the approaches of Parner (2001) and Andersen (2003). The method is mainly studied when the families consist of groups of exchangeable members (e.g. siblings) or members at different levels (e.g. parents and children). The advantages of the proposed method are especially clear in this last case where very flexible modelling is possible. The suggested method is also studied in simulations and found to be efficient compared to maximum likelihood. Finally, the suggested method is applied to a family study of deep venous thromboembolism where it is seen that the association between ages at onset is larger for siblings than for parents or for parents and siblings. Keywords: All possible pairs; Composite likelihood; Copula; Family studies; Optimal weights; Two-stage estimation. 1. INTRODUCTION In register-based family studies failure times on related individuals are observed and familial aggregation of a disease can be regarded as correlation of failure times within families. A family may be a group of exchangeable individuals such as siblings, but this is not always the case. The family may, for example, consist of parents and siblings. Both of these types of family studies have been the practical motivation for this work. There have been two main approaches when modelling correlated data, namely random effects models or marginal models, and this has also been the case for survival data (Lee et al., 1992; Wei et al., 1989; Oakes, 1989; Nielsen et al., 1992). This paper will, however, concentrate on copula models (Genest and MacKay, 1986), which offer a very flexible framework for combining the marginal approach with a model for the dependence within units. The joint survival function is modelled through the marginal survival functions and an association parameter. In family studies the main interest lies in the association between family members, but it is also important to be able to take possible confounders into account. Using the copula approach the association is estimated while covariates are included in the marginal models. A two-stage estimation procedure suggests itself to these models by first estimating the parameters in the marginal models and regarding them as fixed when estimating the association parameter. This Biostatistics 5(1) c Oxford University Press (2004); all rights reserved.

2 16 E. WREFORD ANDERSEN estimation procedure was suggested by Hougaard (1986) and has later been studied by Shih and Louis (1995) and Genest et al. (1995). Glidden (2000) concentrated on the case where the marginal model is a Cox model and the model for the association is based on the gamma model, whereas in Andersen (2003) a general choice of marginal model combined with a copula was studied. An extension of the copula models to hierarchical data was suggested by Bandeen-Roche and Liang (1996), but as noted by the authors this approach is not so well suited to families of parents and children because some choices of copula lead to unwanted constraints on the parameters. In this paper the copula approach is extended to include families consisting of members at different levels, e.g. parents and children, by combining the two-stage estimation procedure with the composite likelihood approach (Parner, 2001; Heagerty and Lele, 1998). Although the methods are motivated by family studies, they can also be used in other cases of correlated failure time data. The paper is organized as follows. Copula models are briefly described in Section 2. In Section 3 the composite likelihood approach for groups of siblings and families of parents and children is presented. The statistical properties of the estimators reached by two-stage estimation combined with composite likelihood are derived in Section 4. In Section 5 different choices of weights are discussed. Section 6 concerns the ascertainment of data. In Section 7 the properties of the suggested estimators are studied in simulations. An application to a family study of deep venous thromboembolism is described in Section 8. Section 9 contains a discussion. 2. COPULA MODELS Let (T 1,...,T K ) be the failure times from a family of exchangeable members and S 1,...,S K the marginal survival functions, possibly depending on covariates. The joint distribution of (T 1,...,T K ) is fully specified by the joint survival function S(t 1,...,t K ). When S(t 1,...,t K ) can be written in the form S(t 1,...,t K ) = C θ {S 1 (t 1 ),...,S K (t K )}, t 1,...,t K 0, (2.1) where C θ is a K dimensional survival function C θ :[0; 1] K [0; 1] with uniform margins and θ is a parameter or possibly a vector of parameters, then (T 1,...,T K ) is said to come from the C θ copula. Different choices of C θ give different joint distributions but the marginal models are unaltered. A special group of copulas is the Archimedean copula model family, where the copulas are of the form C θ (u 1,...,u K ) = φ θ {φθ 1 (u 1 ) + +φθ 1 (u K )} with 0 u i 1, i = 1,...,K,0 φ θ,φ θ (0) = 1, φ θ < 0, φ θ > 0. In this paper the main example of an Archimedean copula is Clayton s family. The survival times are (T 1,...,T K ) with marginal survival functions {S 1 (t 1 ),...,S K (t K )}=(u 1,...,u K ) and joint survival function C(u 1,...,u K ). Clayton s family is then given as C(u 1,...,u K ) ={u 1 θ 1 + +u 1 θ K (K 1)} 1 1 θ, θ > 1. Here φ(u) = (1+u) 1 1 θ is the Laplace transform of a gamma distribution with mean 1 and variance θ 1. The failure times T i and T h are positively associated when θ>1 and independent for θ THE COMPOSITE LIKELIHOOD APPROACH In the following it will be shown that the composite likelihood approach offers a very flexible way of analysing clustered failure time data. With the work done by Andersen (2003) it is possible to study groups

3 Composite likelihood for family studies 17 of exchangeable members, e.g. siblings, where the groups can have any size. However, in this paper the family members do not have to be exchangeable. This situation occurs when the families consist not only of siblings, but also family members on another level, e.g. parents and children or half-siblings who live in the same home. When the families are groups of siblings a composite likelihood approach is also a possibility, in which instead of one contribution from each group, each possible pair of siblings gives rise to a contribution. This means that software meant for analysing pairs can be used for groups of any size. This is, however, just a special case of the situation where the members are no longer exchangeable. 3.1 Groups of siblings The joint distribution for a family is given by the joint survival function which again is modelled through the marginal survival functions and a copula tying the marginal distributions together. In the two-stage estimation the parameters in the margins, β, are estimated in the first stage taking the clustering into account when estimating the variance of the parameters (Section 4). In the second stage the association parameter θ is estimated using the score equation from the joint likelihood, but with the estimates from stage one regarded as known. If the logarithm of the likelihood is log L(β, θ) = n l j (β, θ) then the score equation for θ is U θ ( ˆβ,θ) = θ l j( ˆβ,θ). Instead of using the joint likelihood in the second stage we suggest using score equations based on the bivariate distributions of all possible pairs of siblings. When the joint survival function is given by a copula as in (2.1) then the bivariate survival functions are given by the same copula. For example, the bivariate survival function for siblings 1 and 2 is S 12 (t 1, t 2 ) = C θ {S 1 (t 1 ), S 2 (t 2 )}. The composite likelihood proposed in the second stage of the estimation is based on these bivariate distributions. Let G j be the set of possible pairs for family j and L ih (β, θ) the likelihood for pair (i, h), then the composite likelihood is log L (θ, β) = (i,h) G j w ih log L ih (θ, β) = (i,h) G j w ih l ih (θ, β) (3.2) where w ih are positive weights. Weights are introduced to compensate for the composite likelihood, thereby putting more emphasis on the large families by comparison with the full likelihood. Parner (2001) showed that the estimates found by maximizing the composite likelihood L are asymptotically Normal under suitable assumptions. In this paper the composite likelihood (3.2) is used to find pseudo score equations for θ in the second stage of the estimation. Because there is just one association parameter, the model can be simplified so each pair in the family enters with the same weight in the composite likelihood (3.2) and the weights only depend on the family size. The choice of weights will be discussed in Section 5. The simplified version of the composite likelihood (3.2) is log L (θ, β) = w j (i,h) G j l ih (θ, β). (3.3) This can be fitted by software for bivariate data by listing separate bivariate observations for the k(k 1)/2 pairs coming from a sibling group of size k.

4 18 E. WREFORD ANDERSEN 3.2 Families of parents and children In the case where the families consist of members at different levels, e.g. parents and children, one possibility is to consider the hierarchical model suggested by Bandeen-Roche and Liang (1996). Assume, for instance, that the families consist of parents (1, 2) and children (3,...,K j ) with survival times given by T ={(T 1, T 2 ), (T 3,, T K j )} where separate survival functions for parents and children are Archimedian copulas S(t 1, t 2 ) = φ θ1 [φθ 1 1 {S 1 (t 1 )}+φθ 1 1 {S 2 (t 2 )}] S(t 3,...,t K j ) = φ θ2 [φθ 1 2 {S 3 (t 3 )}+ +φθ 1 2 {S K j (t K j )}]. To find the simultaneous distribution for the family, Bandeen-Roche and Liang (1996) suggested combining these two survival functions in a joint survival function using an Archimedian copula S(t 1,...,t K j ) = φ θ3 [φ 1 θ 3 {S(t 1, t 2 )}+φ 1 θ 3 {S(t 3,...,t K j )}]. (3.4) Bandeen-Roche and Liang (1996) gave conditions to ensure that (3.4) is a valid survival function. These conditions can in some cases lead to unwanted constraints on the association parameters. For instance, if 1 θ Clayton s family is chosen for each of the three Archimedian copulas, so φ θ1 (s) = (1 + s) 1 1, φ θ2 (s) = 1 θ (1 + s) θ and φ θ3 (s) = (1 + s) 1 3, then (3.4) is a survival function with the following constraints on the three parameters 1 <θ 3 <θ 1 and 1 <θ 3 <θ 2. (3.5) The parameters in Clayton s family can be interpreted in the following manner. One can define an association measure γ as γ(t i, t h ) = λ T i T h (t i T h = t h ) λ Ti T h (t i T h > t h ), where λ is the intensity of disease. Then γ measures the change in risk for person i if person h gets the disease at time t h compared to when person h is disease-free at time t h.itisonly in Clayton s family that γ is independent of time (i.e. constant) and moreover γ = θ. Inthe model with parents and children the three parameters can be thought of as θ 1 = γ(t 1, t 2 ), which is the association between parents, θ 2 = γ(t i, t h ) i, h 3, the association between two children, and θ 3 = γ(t i, t h ) i = 1, 2, h 3, the association between parents and children. The constraints (3.5) on the parameters imply that the association between parents is stronger than that between parents and children, which may not be plausible because parents and children are genetically more similar than parents. Instead of postulating a joint model such as (3.4), the suggestion here is to model the bivariate margins and combine the likelihood contributions in a composite likelihood. If H j is the set of possible pairs combining parents and children in family j, G j the set of possible pairs of children and l ih the logarithm of the likelihood for pair (i, h) then the logarithm of the composite likelihood can be written as log L (β, θ 1,θ 2,θ 3 ) = {w j 12 l 12(β, θ 1 ) + + (l,m) H j w j (i,h) G j w j ih l ih(β, θ 2 ) lm l lm(β, θ 3 )}. (3.6)

5 Composite likelihood for family studies 19 The composite likelihood in (3.6) is simplified in the same way as (3.2) so pairs of parents and children from the same family get the same weight, and likewise pairs of children from the same family get the same weight. This means that (3.6) can be written as log L (β, θ 1,θ 2,θ 3 ) = {w 1 j l 12 (β, θ 1 ) + w 2 j +w 3 j This is a composite likelihood of exactly the same type as (3.3). (i,h) G j l ih (β, θ 2 ) (l,m) H j l lm (β, θ 3 )}. (3.7) 4. TWO-STAGE ESTIMATION As in Andersen (2003), two-stage estimation will be used to find the estimates of the parameters (β, θ). The two-stage estimation suits the way the models are constructed using copulas to tie the marginal distributions together with an association parameter. In the first stage the parameters in the marginal model, β, are estimated taking the clustering into account when the variance of the estimate is calculated. In the second stage the estimates from the first stage are regarded as fixed in an estimating equation for the association parameter θ. Calculation of the variance of the estimated association parameter then takes into account the estimation uncertainty from the first stage. 4.1 Notation and some assumptions There are n families indexed j = 1,...,n with K j members in family j, indexed i = 1,...,K j. Let T ij be the failure time for person (i, j), C ij the censoring time and Z ij covariates. Define T j = {T ij, i = 1,...,K j } and similarly C j and Z j. Suppose (T j, C j ) Z j ( j = 1,...,n) are independent identically distributed random variables and T j is independent of C j conditional on Z j.weobserve X ij = min(t ij, C ij ) and δ ij = I (T ij C ij ). The composite likelihood used to find an estimate of the association parameter θ is based on the bivariate distributions, so a model is assumed for all the interesting combinations of pairs in the family. Let M j be the set of interesting pairs. The logarithm of the composite likelihood becomes log L (θ, β) = (i,h) M j w ih log L ih (θ, β) = (i,h) M j w ih l ih (θ, β) (4.8) where w ih are positive weights, L ih is the likelihood for pair (i, h) and l ih = log L ih. The composite likelihood (4.8) covers both of the cases in Sections 3.1 and The asymptotic distribution in the parametric case First assume that the margins are modelled parametrically depending on a finite number of parameters β, which may include effects of the covariates Z. In the first stage, β is estimated by solving U β (β) = K j i=1 δ ij β log f (x ij,β)+ (1 δ ij ) β log S ij(x ij,β)= U β. j (β) = 0.

6 20 E. WREFORD ANDERSEN This is also the score equation for β in the case of independence and is unrelated to the composite likelihood (4.8). In the second stage, the estimate ˆθ of the association parameter θ is found as the solution to the pseudo score equation for θ based on the composite likelihood (4.8) with the estimate from stage one ( ˆβ) plugged in, hence U θ ( ˆβ,θ) = θ log L = (i,h) M j θ w ih log L ih ( ˆβ,θ) = Let V β = varu β.1 (β 0), V θ = varu θ.1 (β 0,θ 0 ), V β,θ =cov{u β.1 (β 0), U θ U θ. j ( ˆβ,θ) = 0. (4.9).1 (β 0,θ 0 )} and R = Iβ 1 V β Iβ 1. PROPOSITION 4.1 Assume standard regularity conditions for the marginal models, the regularity conditions stated in the appendix and that n 1 β U β, n 1 β U θ and n 1 θ U θ converge to I β, I βθ and I θ at (β 0,θ 0 ) as n. Then n 1 2 ( ˆβ β 0, ˆθ θ 0 ) converges to a Normal distribution with mean (0, 0) and variance covariance ( where I 1 θ V = Iθ 1 V θ Iθ 1 R I βθ R + Iθ 1 V βθ Iβ 1 + Iθ 1 I βθ RI βθ I θ 1 I 1 θ RI βθ I 1 θ + Iβ 1 V V βθ Iβ 1 I βθ I θ 1 V βθ I 1 θ I 1 θ ), I βθ Iβ 1 V βθ I θ 1. The proof of Proposition 4.1 is a straightforward generalization of the proof of Theorem 1 in Shih and Louis (1995). The variance is estimated by ( Vβ V βθ ) V βθ n 1 V θ n = E{(U β.1, U θ.1 ) (U β.1, U θ.1 )} (U β. j, U θ. j ) (U β. j, U θ. j ). 4.3 The asymptotic distribution in the semi-parametric case In this section we derive the asymptotic distribution of the parameters in the semi-parametric model using the two-stage method and a composite likelihood for θ. The marginal intensity λ ij (t) for person i in family j follows a Cox model λ ij (t) = λ 0 (t) exp(β Z ij ), where the baseline intensity λ 0 (t) is an unknown function of t and Z ij is a vector of covariates for person (i, j). Itisalso possible to have a stratified model or a model without covariates, leaving the marginal model purely non-parametric. We denote the counting process as N ij (t) = I (X ij t,δ ij = 1), the indicator of risk Y ij (t) = I (X ij t), the maximum follow-up time τ, and the integrated baseline intensity 0 (t) = t 0 λ 0(s)ds. The composite log-likelihood is a sum over the possible pairs, log L = n (i,h) M j l{θ,β, 0 (X ij ), 0 (X hj )}.

7 Composite likelihood for family studies 21 In the first stage of the estimation the marginal models are fitted taking the clustering into account using the method of Lee et al. (1992). The resulting estimates are ˆβ and ˆ 0 (t). The estimate for β is found by solving the marginal score equation U β (β) = K j i=1 τ 0 { } Z ij S(1) (β, u) S (0) dn ij (u) = (β, u) U β. j = 0, (4.10) where S (0) (β, u) = n 1 n K j i=1 Y ij(u) exp(β Z ij ) and S (1) (β, u) = n 1 n K j i=1 Y ij(u)z ij exp(β Z ij ), while the estimator for 0 (t) is an Aalen Breslow type estimator t dn... (u) ˆ 0 (t, ˆβ) = 0 ns (0) ( ˆβ,u). Spiekerman and Lin (1998) have shown that under suitable regularity conditions, ˆβ is asymptotically Normal around the true value and n 1 2 { ˆ 0 (t, ˆβ) 0 (t)} converges to a zero-mean Gaussian random field. At the second stage of estimation the estimates from the first stage are plugged into the pseudo score function for θ, which is based on the composite log-likelihood (4.8). This creates the pseudo score function U θ for the parameter θ: U θ (θ, ˆβ, ˆ 0 ) = (i,h) M j θ w ihl{θ, ˆβ, ˆ 0 ( ˆβ,t i ), ˆ 0 ( ˆβ,t h )}. (4.11) The estimate ˆθ is found by solving the equation obtained by setting (4.11) equal to zero. Under the regularity conditions stated in the appendix the estimator of the association parameter has the following asymptotic distribution. PROPOSITION 4.2 n 1 2 ( ˆθ θ 0 ) converges to a Normal distribution with mean zero and variance Iθ 1 V (W )Iθ 1. The precise definition of V (W ) is found in the appendix. The proof of Proposition 4.2 follows closely that of Proposition 3.2 in Andersen (2003). The variance from Proposition 4.2 is estimated by inserting the estimates in the formulae in the appendix. 5. CHOICE OF WEIGHTS In the composite likelihood approach (3.2) and (3.6) the likelihood contributions are weighted together using positive weights, different choices of weights leading to different estimators. The maximum likelihood estimate using the full likelihood is the most efficient method, but is not always available. Andersen (2003) showed that the two-stage method has good efficiency and one would expect that the two-stage method using a composite likelihood in the second stage would be less efficient. The question is now whether it is possible to choose optimal weights so the loss of efficiency becomes as small as possible. This will be investigated informally for the two situations from Sections 3.1 and One association parameter First, we concentrate on the simplest case with a parametric model and just one association parameter as in Section 3.1. Even in this case the resulting variance of ˆθ is quite complicated (Propositions 4.1 and

8 22 E. WREFORD ANDERSEN 4.2) and the problem is simplified by assuming that the parameters from the first stage are known. This leaves the variance as V = Iθ 1 V θ Iθ 1. Lindsay (1988) has found an expression for optimal weights in the one-dimensional case, and since the problem is here reduced to only one dimension the suggested weights are calculated. From (3.3) one sees that the pseudo score equation for θ is U θ ( ˆβ,θ) = θ log L = w j (i,h) G j θ l ih( ˆβ,θ) = w j (i,h) G j S (ih) j ( ˆβ,θ), (5.12) where S (ih) j is the score contribution from pair (i, h) in family j. Let U be the score function for θ based on the full likelihood with the marginal parameters β assumed known, i.e. U = θ log L( ˆβ,θ). Lindsay (1988) shows that the optimal weights are w opt =[vars] 1 E(US), (5.13) where S is the vector of score contributions. In this set-up E(US) = E(S 2 ), where S 2 denotes the vector whose elements are the squared elements of S. The variance used for the weights (5.13) is a block matrix since the families are assumed to be independent and the size of each block depends on the size of the family. It is modelled using three parameters: σ 2 for the variance in the diagonal, ω for the covariance between score contributions from pairs with one person in common and ρ for the covariance between score contributions from pairs with nobody in common. If, for instance, the families have 2, 3, 4 or 5 members then the weights (w 2,w 3,w 4,w 5 ) are found by solving equation (5.13): w 2 = 1, w 3 = σ 2 /(σ 2 + 2ω), w 4 = σ 2 /(σ 2 + 4ω + ρ) and w 5 = σ 2 /(σ 2 + 6ω + 3ρ). The parameters σ 2,ω,and ρ are estimated by ˆσ 2 = 1 n 1 ˆω = 1 n 2 ˆρ = 1 n 3 S(ih) 2 j (ih) G j {((ih),(lm)) G 2 j i=l,h =m or i =l,h=m} S (ih) j S (lm) j {((ih),(lm)) G 2 j i =l,h =m} S (ih) j S (lm) j, where n 1 is the total number of pairs, n 2 the number of elements in the set {((ih), (lm)) G 2 j i = l, h = m or i = l, h = m}, j = 1,...,n and n 3 the number of elements in the set {((ih), (lm)) G 2 j i = l, h = m}, j = 1,...,n. 5.2 More than one association parameter In Section 3.2 families of parents and children were considered. The association parameter θ = (θ 1,θ 2,θ 3 ) (for parents, children, and parent child pairs) is now three-dimensional and the variance covariance matrix

9 Composite likelihood for family studies 23 could be defined as optimal if it is smaller than other variance covariance matrices when ordering the matrices by positive definite differences. However, in Lindsay (1988) it is mentioned that an optimal choice of weights is not usually globally attainable. Since the three likelihood contributions depend on separate parameters one possible approach is to treat the choice of weights as three separate problems. The pseudo score equations for θ are found from (3.7): U θ1 ( ˆβ,θ 1 ) = log L = w 1 l 12 ( ˆβ,θ 1 ) (5.14) θ 1 θ 1 U θ2 ( ˆβ,θ 2 ) = log L = θ 2 U θ3 ( ˆβ,θ 3 ) = log L = θ 3 w 2 j (i,h) G j w 3 j (l,m) H j θ 2 l ih ( ˆβ,θ 2 ) (5.15) θ 3 l lm ( ˆβ,θ 3 ). (5.16) Here w 1 is independent of family number, since θ 1 is the association parameter for parents and estimation is always based on the pair of parents, so the natural choice is w 1 = 1. Let S 2 be the vector of score contributions S (2) (ih) j for θ 2 and S 3 the vector of score contributions S (3) (lm) j for θ 3. Then one could choose weights as in (5.13) w 2 =[vars 2 ] 1 E(US 2 ) (5.17) w 3 =[vars 3 ] 1 E(US 3 ). (5.18) Again the variances used to calculate the weights are block matrices since the families are assumed to be independent and the size of each block depends on the size of the family. They are modelled as in Section 5.1 with separate parameters in the two variances. If, for example, the families have 1, 2, or 3 children then the weights for θ 2, (w (2) 2,w(3) 2 ), are now found by solving equation (5.17). Similarly, the weights for θ 3, (w (1) 3,w(2) 3,w(3) 3 ), are found by solving equation (5.18). This leads to w (2) 2 = 1, w (3) 2 = σ2 2/(σ ω 2), w (1) 3 = σ3 2/(σ ω 3), w (2) 3 = σ3 2/(σ ω 3 + ρ 3 ) and w (3) 3 = σ 2 3 /(σ ω 3 + 2ρ 3 ). The parameters σ 2 2,ω 2,σ 2 3,ω 3 and ρ 3 are estimated in the following way: ˆσ 2 2 = 1 n 1 ˆω 2 = 1 n 2 ˆσ 2 3 = 1 n 3 ˆω 3 = 1 n 4 ˆρ 3 = 1 n 5 (ih) G j S (2) (ih) j S(2) (ih) j {((ih),(lm)) G 2 j i=l,h =m or i =l,h=m} S (2) (ih) j S(2) (lm) j (ih) H j S (3) (ih) j S(3) (ih) j {((ih),(lm)) H 2 j i=l,h =m or i =l,h=m} S (3) (ih) j S(3) (lm) j S (2) (ih) j S(2) (lm) j, {((ih),(lm)) H 2 j i =l,h =m}

10 24 E. WREFORD ANDERSEN where n 1 is the total number of pairs in G j, j = 1,...,n, n 2 the number of elements in the set {((ih), (lm)) G 2 j i = l, h = m or i = l, h = m}, j = 1,...,n, n 3 is the total number of pairs in H j, j = 1,...,n, n 4 the number of elements in the set {((ih), (lm)) H 2 j i = l, h = m or i = l, h = m}, j = 1,...,n and n 5 the number of elements in the set {((ih), (lm)) H 2 j i = l, h = m}, j = 1,...,n. The weights calculated in this way are used in the second stage of estimation when θ = (θ 1,θ 2,θ 3 ) is estimated setting (5.14) (5.16) equal to zero simultaneously. 6. A SAMPLED DATASET Until now it has been assumed that the dataset is a random sample of families. For rare diseases this design may be inefficient, and different sampling schemes may be considered. One possible strategy could be to sample all families with at least one case and a random sample of families without a case. Adapting the results from Binder (1992) this sampling scheme has been considered in Andersen (2003) who suggests weighting the estimating equations by the inverse sampling probabilities. Let π j be the sampling probability for family j, ξ j = 1iffamily j is chosen and 0 otherwise, and n the total number of families in the population. Taking the parametric case as an example, the estimating equation for β in the first stage becomes Ũ β = k i=1 ξ j π j U ij β. The estimating equation for θ, (4.9) or (4.11), is weighted with the inverse sampling probabilities in the same way, which means that in the second stage of estimation there are two sets of weights, one to take the sampling into account and one for the pairwise comparisons, leading to Ũ θ = ξ j π j U j θ. The estimates derived from the weighted analysis are still asymptotically Normal with a distribution derived in exactly the same way as in Andersen (2003). It is important to have a good approximation to the true sampling probability π j for a family j as simulations have shown that misspecified weights give biased results. For the suggested sampling scheme the probability is known to be 1, when there is at least one case in the family. The sampling probability π j is { 1 if there is at least one case in family j π j = m k N k if family j is of size k and no case in family j, where m k is the number of families of size k in the sample and N k is the number of families of size k in the population. In practice it can be a problem to determine N k,but an approximation can be found when the families are constructed from a random sample of individuals. Let X be the number of families constructed on the basis of the random sample of persons drawn from a population of N individuals. Then the number of families is X = k 1 m k and the number of individuals is N = k 1 kn k.anadhoc approximation to N k is Ñ k = (m k N)/(kX), which preserves the correct number of individuals, N = k 1 k Ñ k.

11 Composite likelihood for family studies SIMULATION STUDIES Some simulation studies were conducted to assess the statistical properties of the proposed method specifically for the sib groups and families of parents and children. A set of simulations was carried out comparing full maximum likelihood, the two-stage estimation using the full family as suggested in Andersen (2003) and the composite likelihood approach suggested here with different choices of weights for different distributions of sib group size. As expected, the maximum likelihood method is the most efficient, followed by the two-stage method where the full sib group is used in the second stage. For the two-stage method with the composite likelihood in the second stage different sets of weights were chosen with the optimal weights doing slightly better than the others. The loss of information is largest in the case with a big proportion of large families (25% of the families have five members), which does not seem surprising, but with the optimal weights the efficiency is still 94% compared to maximum likelihood. In most practical applications in Denmark the average size of sib groups will be closer to two than five. Simulation studies were also performed for datasets consisting of parents and children and again the method using a composite likelihood in the second stage of estimation had an efficiency of more than 90% compared to maximum likelihood, and showed little bias. A detailed description of the simulation results can be found at oupjournals.org. 8. A FAMILY STUDY OF DEEP VENOUS THROMBOEMBOLISM Several studies have shown that there are genetic and acquired risk factors for developing deep venous thrombosis and pulmonary embolism (in the following, thromboembolism) (Rosendaal, 1999; Seligsohn and Lubetsky, 2001). The analysis presented in this section is an exploratory analysis of the different amounts of familial aggregation in pairs of parents, parents and children, and sibling pairs. The study makes use of the nation-wide Danish registers based on the Civil Registration System and the Danish National Registry of Patients. Everybody in Denmark is registered in the Civil Registration System with a personal identification number, which is used in all registers. The Civil Registration System also includes a link to parents, making it possible to identify families. The Danish National Registry of Patients started in 1977 and includes information on all admissions to Danish hospitals. The study base was constructed by taking all patients from the Danish National Registry of Patients who were born in the period and with a given set of diagnoses (5329 patients). A random sample of persons, born in the same period and alive on 1 January 1977, was drawn from the population using the Civil Registration System. The parents and siblings of these sampled persons were identified using the link in the Civil Registration System. Together with the original sample of persons, these now constitute the study base. The events were identified in the Danish National Registry of Patients in the period 1 January 1977 until 31 December All in all there were families where the children had both parents in common. Within the study period, 3339 first-time events of thromboembolism were identified. When studying familial aggregation the families with several events contain most information. In our data, 2871 families had one person with an event, 208 had two, 16 had three and one family has four persons with a diagnosis of thromboembolism. Clayton s distributional family was chosen for each of the three types of pairs. In the case where the

12 26 E. WREFORD ANDERSEN 5329 from DNRP born random sample (CRS) born , alive 1/ Parents and sibs in CRS Fig. 1. Construction of the study base using the Danish National Registry of Patients (DNRP) and the Civil Registration System (CRS). family consists of two parents (T 1, T 2 ) and k 2 children (T 3,...,T k ) this means that S(t 1, t 2 ) ={S(t 1 ) (1 θ 1) + S(t 2 ) (1 θ 1) 1} 1 1 θ 1 S(t i, t j ) ={S(t i ) (1 θ2) + S(t j ) (1 θ2) 1 θ 1} 1 2, i, j = 3,...,k (8.19) S(t i, t j ) ={S(t i ) (1 θ3) + S(t j ) (1 θ3) 1 θ 1} 1 3, i = 1, 2 j = 3,...,k. Here θ 1 is the association between parents, θ 2 the association between two siblings and θ 3 the association between a parent and a child. There is delayed entry because the Danish National Registry of Patients started in This is taken into account by using the conditional survival function. If v i,v j denote the ages at entry, then the conditional survival function is P(T i > t i, T j > t j T i >v i, T j >v j ) = S(t i, t j ) S(v i,v j ). (8.20) Time-dependent covariates are now difficult to handle correctly since the conditional survival function (8.20) still depends on the time from birth until the person entered the study. In this example calendar period is part of the model and it is assumed that the risk of thromboembolism was the same before the register started as in the first period from The data are sampled as described in Section 6, and this is taken into account in the analyses. For the first stage of the analysis population rates have been used and assumed known. This means that there is no extra variation to take into account in the second stage leading to possible underestimation of the variance. However, since the dataset is so large the estimates from the marginal analysis will be close to the population rates. In the second stage the model (8.19) was fitted to the data taking the sampling and delayed entry into account. Two different sets of weights were chosen to account for the pairs: all weights set to 1 or the optimal weights from Section 5.2. The results are seen in Table 1. Table 1 shows that the association is largest for siblings with ˆθ 2 = 10.0, which is significantly larger than 1. The association for parents is smallest with an estimate of ˆθ 1 = 2.5 and a confidence interval including 1 when the optimal weights are chosen. This means that the constraints from (3.5) do not hold and it would not be possible to fit a joint model of the type (3.4) using Clayton s family for each copula. Since the association for parents is smaller than the association for the other types of pairs, this could

13 Composite likelihood for family studies 27 Table 1. Estimates of the association between parents (θ 1 ), children (θ 2 ) and parent/child (θ 3 )inthe application to deep venous thromboembolism Pairwise weights Parameter Estimate Std error θ(95% Conf. int.) Kendall s τ (1,1,1) log θ (1.07; 5.84) log θ (6.31; 15.89) log θ (3.51; 5.03) optimal log θ (0.98; 6.35) log θ (6.51; 15.37) log θ (3.39; 4.99) indicate a genetic factor in the familial aggregation. A simple genetic model would imply that θ 2 = θ 3,as a parent and a child share 50% of their genes on average as do two siblings. Testing this hypothesis, using awald type test in the setting with optimal weights, results in a test statistic of and a test probability of , hence the hypothesis is rejected. The optimal choice of weights improves the standard error for the association among siblings and the relative improvement is larger than the loss of precision for the parent child pairs, suggesting that the optimal weights are a good choice. All calculations were carried out using SAS version DISCUSSION In this paper a two-stage procedure combined with a composite likelihood in the second stage has been studied, with particular reference to the two cases of sib groups and families consisting of parents and children. The sib groups can already be studied using a two-stage method as in Andersen (2003), but with the method presented here it is only necessary to study all possible pairs. This means that software designed for pairs can be used. The loss of efficiency compared to the methods using the full likelihood depends on the amount of information outside the pairs and the choice of weights. Simulations indicate that choosing the optimal weights from Lindsay (1988) is sensible. When 25% of sibling groups have five members the efficiency is still above 90%. For the families of parents and children the composite likelihood approach gives a flexible framework in which to model this type of data. Other approaches have been suggested. The hierarchical models in Bandeen-Roche and Liang (1996) are also based on copulas but they can give unwanted constraints on the parameters. Additive and multiplicative frailty models have also been suggested (e.g. Petersen, 1998, Yashin, 1995). In these models the bivariate margins are not generally shared frailty models. Li and Zhong (2002) suggested an additive genetic gamma frailty model, which can be used in family studies where genetic information is available. The composite likelihood approach has also been studied by Parner (2001), but in this paper the composite likelihood approach is linked to the two-stage estimation. The weights suggested in Section 5 are not optimal in a mathematically precise sense, but they are optimal in some simple situations and also seem to perform well in more complicated cases. The situation where the families consist of members at the same level is very close to the one-dimensional case studied by Lindsay (1988) where the weights are truly optimal. They are not difficult to calculate, and simulations have shown that they perform well. In the case of family members at different levels the weights are more complicated to calculate and the advantage is not as clear. In such cases, one might consider a simpler choice of weights.

14 28 E. WREFORD ANDERSEN In summary, the composite likelihood approach combined with the two-stage estimation promises to be a useful tool in the analysis of sibling groups. It also presents new possibilities when studying families with members at different levels. ACKNOWLEDGEMENTS Work for this paper was started while the author was visiting the MRC Biostatistics Unit, Cambridge, UK. The author would like to thank Per Kragh Andersen and David Clayton for their valuable suggestions and comments and Henrik Toft Sørensen and Jørn Olsen for making the data on deep venous thromboembolism available. The activities of the Danish epidemiology Science Centre are supported by a grant from the Danish National Research Foundation. APPENDIX We first give some notation. Let M be the number of possible pairs, X 1 = (X 11,...,X 1M ),...,X n = (X n1,...,x nm ) n independent identically distributed replications of M pairs Y = (Y 1,...,Y M ), L ik (β, θ k ) the likelihood function for X ik, U ik (β, θ k ) the score function, U ik (β, θ k ) = log L ik. θ k Define the Fisher information matrix for the kth pair and the observed information matrix 2 i k (θ k ) = E 0 { θ k θ k log L ik (β, θ k )} j k (θ k ) = 1 n 2 θ k θ k L ik (β, θ). i=1 The following assumptions (A.1) concerning the bivariate models are adapted from Parner (2001) and assumed for Proposition 4.1. ASSUMPTION A.1 1. The functions θ k θ k L ik (θ k ), θ k θ k log L ik (θ k ) are locally, uniformly in θ k, dominated by integrable functions. 2. If is unbounded then for any sequence {θ kn } n in such that θ kn, log L ik (θ kn ), P a.s. i=1 3. For any sequence {θ kn } n in such that θ kn θ k then 1 log L ik (θ kn ) E 0 {log L 1k (θ k )}, P a.s. n i=1

15 Composite likelihood for family studies For any sequence {θ kn } n in where θ kn θ k then j kn (θ kn ) i k (θ k ). 5. The parameter θ k can be identified from the distribution of Y k. 6. The Fisher information matrix i k (θ k ) is positive definite. 7. The expectations E 0 {U jl (θ 0 j )U kh (θ 0k )} 2 < for j, k = 1,...,K, l = 1,...,dim(θ j ) and h = 1,...,dim(θ k ). DEFINITION A.1 Let the quantities in Proposition 4.2 be W θ (θ, u,v 1,...,v K j ) = w ih θ l(θ, u,v i,v h ), (i,h) M j V θ (θ, u,v 1,...,v K j ) = w 2 ih θ (i,h) M 2 l(θ, u,v i,v h ), j V i (θ, u,v 1,...,v K j ) = i=l ori=h,(l,h) M j w lh I θ = E{ V θ (θ 0,β 0, 0 )} M ij (t) = N ij (t) exp(β Z ij ) The following assumptions are used in the proof of Proposition 4.2. t 0 2 θv i l(θ, u,v l,v h ) Y ij (s)λ 0 (s)ds. ASSUMPTION A.2 Assume the regularity conditions from Spiekerman and Lin (1998), Assumption A.1 and that θ l(θ, u,v i,v h ) and 2 l(θ, u,v θ 2 i,v h ) are continuous and bounded functions of u, v i and v h. The variance from Proposition 4.2 is consistently estimated by inserting the estimates in the formulae in the following way: Ŵ j = W θ { ˆθ, ˆβ, ˆ 0 ( ˆβ, X 1 j ),..., ˆ 0 ( ˆβ, X K j j)} ˆ j = Î θβ Îβ 1 Û β Î θ = n 1 Î θβ = n 1 Î i (t i ) = n 1 n n n. j + K j i=1 t d ˆM. j (u) ˆ j (t) = 0 S (0) ( ˆβ) τ 0 Î i (t i )d ˆ j (t i ) θ W θ { ˆθ, ˆβ, ˆ 0 ( ˆβ, X 1 j ),..., ˆ 0 ( ˆβ, X K j j)} β W θ { ˆθ, ˆβ, ˆ 0 ( ˆβ, X 1 j ),..., ˆ 0 ( ˆβ, X K j j)} Y ij (t i )V i { ˆθ, ˆβ, ˆ 0 ( ˆβ, X 1 j ),..., ˆ 0 ( ˆβ, X K j j)} { t 0 } E( ˆβ,u)d ˆ 0 (u, ˆβ) Îβ 1 Û β. j.

16 30 E. WREFORD ANDERSEN REFERENCES ANDERSEN, E. W.(2003). Two-stage estimation in copula models used in family studies. Lifetime Data Analysis (accepted). BANDEEN-ROCHE, K. J. AND LIANG, K.-Y. (1996). Modelling failure-time associations in data with multiple levels of clustering. Biometrika 83, BINDER, D.A.(1992). Fitting Cox s proportional hazards models from survey data. Biometrika 79, GENEST, C., GHOUDI, K. AND RIVEST, L. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82, GENEST, C.AND MACKAY, J.(1986). The joy of copulas. The American Statistician 40, GLIDDEN, D.V.(2000). A two-stage estimator of the dependence parameter for the Clayton Oakes model. Lifetime Data Analysis 6, HEAGERTY, P. J. AND LELE, S. R.(1998). A composite likelihood approach to binary spatial data. Journal of the American Statistical Association 93, HOUGAARD, P.(1986). A class of multivariate failure time distributions. Biometrika 73, LEE, E. W., WEI, L. J. AND AMATO, D. A.(1992). Cox-type regression analysis for large numbers of small groups of correlated failure time observations. In Klein, J. and Goel, P. (eds), Survival Analysis: State of the Art, Dordrecht: Kluwer, pp LI, H. AND ZHONG, X.(2002). Multivariate survival models induced by genetic frailties, with application to linkage analysis. Biostatistics 3, LINDSAY, B.G.(1988). Composite likelihood methods. Contemporary Mathematics 80, NIELSEN, G. G., GILL, R. D., ANDERSEN, P. K. AND SØRENSEN, T. I. A.(1992). A counting process approach to maximum likelihood estimation in frailty models. Scandinavian Journal of Statistics 19, OAKES, D.(1989). Bivariate survival models induced by frailties. Journal of the American Statistical Association 84, PARNER, E.(2001). A composite likelihood approach to multivariate survival data. Scandinavian Journal of Statistics 28, PETERSEN, J.H.(1998). An additive frailty model for correlated life times. Biometrics 54, ROSENDAAL, F.(1999). Venous thrombosis: a multicausal disease. The Lancet 353, SELIGSOHN, U. AND LUBETSKY, A.(2001). Genetic susceptibility to venous thrombosis. New England Journal of Medicine 344, SHIH, J. H. AND LOUIS, T. A.(1995). Inferences on association parameter in copula models for bivariate survival data. Biometrics 51, SPIEKERMAN, C.F.AND LIN, D.Y.(1998). Marginal regression models for multivariate failure time data. Journal of the American Statistical Association 93, WEI, L. J., LIN, D. Y. AND WEISSFELD, L.(1989). Regression analysis of multivariate incomplete failure time data by modelling marginal distributions. Journal of the American Statistical Association 84, YASHIN, A., VAUPEL, J. AND IACHINE, I.(1995). Correlated individual frailty: an advantageous approach to survival analysis of bivariate data. Mathematical Population Studies 5, [Received June 10, 2002; first revision March 17, 2003; second revision April 14, 2003; accepted for publication May 7, 2003]

Frailty Models and Copulas: Similarities and Differences

Frailty Models and Copulas: Similarities and Differences Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt

More information

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Statistica Sinica 20 (2010), 441-453 GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Antai Wang Georgetown University Medical Center Abstract: In this paper, we propose two tests for parametric models

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

More information

Pairwise dependence diagnostics for clustered failure-time data

Pairwise dependence diagnostics for clustered failure-time data Biometrika Advance Access published May 13, 27 Biometrika (27), pp. 1 15 27 Biometrika Trust Printed in Great Britain doi:1.193/biomet/asm24 Pairwise dependence diagnostics for clustered failure-time data

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

On the Breslow estimator

On the Breslow estimator Lifetime Data Anal (27) 13:471 48 DOI 1.17/s1985-7-948-y On the Breslow estimator D. Y. Lin Received: 5 April 27 / Accepted: 16 July 27 / Published online: 2 September 27 Springer Science+Business Media,

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

Multivariate Survival Data With Censoring.

Multivariate Survival Data With Censoring. 1 Multivariate Survival Data With Censoring. Shulamith Gross and Catherine Huber-Carol Baruch College of the City University of New York, Dept of Statistics and CIS, Box 11-220, 1 Baruch way, 10010 NY.

More information

A Measure of Association for Bivariate Frailty Distributions

A Measure of Association for Bivariate Frailty Distributions journal of multivariate analysis 56, 6074 (996) article no. 0004 A Measure of Association for Bivariate Frailty Distributions Amita K. Manatunga Emory University and David Oakes University of Rochester

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Estimation and Goodness of Fit for Multivariate Survival Models Based on Copulas

Estimation and Goodness of Fit for Multivariate Survival Models Based on Copulas Estimation and Goodness of Fit for Multivariate Survival Models Based on Copulas by Yildiz Elif Yilmaz A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the

More information

On consistency of Kendall s tau under censoring

On consistency of Kendall s tau under censoring Biometria (28), 95, 4,pp. 997 11 C 28 Biometria Trust Printed in Great Britain doi: 1.193/biomet/asn37 Advance Access publication 17 September 28 On consistency of Kendall s tau under censoring BY DAVID

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

More information

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Proportional hazards model for matched failure time data

Proportional hazards model for matched failure time data Mathematical Statistics Stockholm University Proportional hazards model for matched failure time data Johan Zetterqvist Examensarbete 2013:1 Postal address: Mathematical Statistics Dept. of Mathematics

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL

ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL Statistica Sinica 19 (2009), 997-1011 ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL Anthony Gamst, Michael Donohue and Ronghui Xu University

More information

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data Biometrika (28), 95, 4,pp. 947 96 C 28 Biometrika Trust Printed in Great Britain doi: 1.193/biomet/asn49 Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Rene Tabanera y Palacios 4. Danish Epidemiology Science Center. Novo Nordisk A/S Gentofte. September 1, 1995

Rene Tabanera y Palacios 4. Danish Epidemiology Science Center. Novo Nordisk A/S Gentofte. September 1, 1995 Estimation of variance in Cox's regression model with gamma frailties. Per Kragh Andersen 2 John P. Klein 3 Kim M. Knudsen 2 Rene Tabanera y Palacios 4 Department of Biostatistics, University of Copenhagen,

More information

Maximum likelihood estimation for Cox s regression model under nested case-control sampling

Maximum likelihood estimation for Cox s regression model under nested case-control sampling Biostatistics (2004), 5, 2,pp. 193 206 Printed in Great Britain Maximum likelihood estimation for Cox s regression model under nested case-control sampling THOMAS H. SCHEIKE Department of Biostatistics,

More information

Sample size and robust marginal methods for cluster-randomized trials with censored event times

Sample size and robust marginal methods for cluster-randomized trials with censored event times Published in final edited form as: Statistics in Medicine (2015), 34(6): 901 923 DOI: 10.1002/sim.6395 Sample size and robust marginal methods for cluster-randomized trials with censored event times YUJIE

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Score tests for dependent censoring with survival data

Score tests for dependent censoring with survival data Score tests for dependent censoring with survival data Mériem Saïd, Nadia Ghazzali & Louis-Paul Rivest (meriem@mat.ulaval.ca, ghazzali@mat.ulaval.ca, lpr@mat.ulaval.ca) Département de mathématiques et

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

A joint modeling approach for multivariate survival data with random length

A joint modeling approach for multivariate survival data with random length A joint modeling approach for multivariate survival data with random length Shuling Liu, Emory University Amita Manatunga, Emory University Limin Peng, Emory University Michele Marcus, Emory University

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen Recap of Part 1 Per Kragh Andersen Section of Biostatistics, University of Copenhagen DSBS Course Survival Analysis in Clinical Trials January 2018 1 / 65 Overview Definitions and examples Simple estimation

More information

Semi-Competing Risks on A Trivariate Weibull Survival Model

Semi-Competing Risks on A Trivariate Weibull Survival Model Semi-Competing Risks on A Trivariate Weibull Survival Model Cheng K. Lee Department of Targeting Modeling Insight & Innovation Marketing Division Wachovia Corporation Charlotte NC 28244 Jenq-Daw Lee Graduate

More information

Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract

More information

Dependent Hazards in Multivariate Survival Problems

Dependent Hazards in Multivariate Survival Problems Journal of Multivariate Analysis 71, 241261 (1999) Article ID jmva.1999.1848, available online at http:www.idealibrary.com on Dependent Hazards in Multivariate Survival Problems Anatoli I. Yashin Max Planck

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Published online: 10 Apr 2012.

Published online: 10 Apr 2012. This article was downloaded by: Columbia University] On: 23 March 215, At: 12:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

A Regression Model For Recurrent Events With Distribution Free Correlation Structure A Regression Model For Recurrent Events With Distribution Free Correlation Structure J. Pénichoux(1), A. Latouche(2), T. Moreau(1) (1) INSERM U780 (2) Université de Versailles, EA2506 ISCB - 2009 - Prague

More information

Modelling Survival Events with Longitudinal Data Measured with Error

Modelling Survival Events with Longitudinal Data Measured with Error Modelling Survival Events with Longitudinal Data Measured with Error Hongsheng Dai, Jianxin Pan & Yanchun Bao First version: 14 December 29 Research Report No. 16, 29, Probability and Statistics Group

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

Robustness of a semiparametric estimator of a copula

Robustness of a semiparametric estimator of a copula Robustness of a semiparametric estimator of a copula Gunky Kim a, Mervyn J. Silvapulle b and Paramsothy Silvapulle c a Department of Econometrics and Business Statistics, Monash University, c Caulfield

More information

Optimal Weight Functions for Marginal Proportional Hazards Analysis of Clustered Failure Time Data

Optimal Weight Functions for Marginal Proportional Hazards Analysis of Clustered Failure Time Data Lifetime Data Analysis, 8, 5 19, 22 # 22 Kluwer Academic Publishers. Printed in The Netherlands. Optimal Weight Functions for Marginal Proportional Hazards Analysis of Clustered Failure Time Data ROBERT

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

Semiparametric Gaussian Copula Models: Progress and Problems

Semiparametric Gaussian Copula Models: Progress and Problems Semiparametric Gaussian Copula Models: Progress and Problems Jon A. Wellner University of Washington, Seattle European Meeting of Statisticians, Amsterdam July 6-10, 2015 EMS Meeting, Amsterdam Based on

More information

Composite Likelihood Estimation

Composite Likelihood Estimation Composite Likelihood Estimation With application to spatial clustered data Cristiano Varin Wirtschaftsuniversität Wien April 29, 2016 Credits CV, Nancy M Reid and David Firth (2011). An overview of composite

More information

Survival Prediction Under Dependent Censoring: A Copula-based Approach

Survival Prediction Under Dependent Censoring: A Copula-based Approach Survival Prediction Under Dependent Censoring: A Copula-based Approach Yi-Hau Chen Institute of Statistical Science, Academia Sinica 2013 AMMS, National Sun Yat-Sen University December 7 2013 Joint work

More information

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Right censored

More information

Beyond GLM and likelihood

Beyond GLM and likelihood Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle Statistical Analysis of Spatio-temporal Point Process Data Peter J Diggle Department of Medicine, Lancaster University and Department of Biostatistics, Johns Hopkins University School of Public Health

More information

Semiparametric Gaussian Copula Models: Progress and Problems

Semiparametric Gaussian Copula Models: Progress and Problems Semiparametric Gaussian Copula Models: Progress and Problems Jon A. Wellner University of Washington, Seattle 2015 IMS China, Kunming July 1-4, 2015 2015 IMS China Meeting, Kunming Based on joint work

More information

Moger, TA; Haugen, M; Yip, BHK; Gjessing, HK; Borgan, Ø. Citation Lifetime Data Analysis, 2010, v. 17, n. 3, p

Moger, TA; Haugen, M; Yip, BHK; Gjessing, HK; Borgan, Ø. Citation Lifetime Data Analysis, 2010, v. 17, n. 3, p Title A hierarchical frailty model applied to two-generation melanoma data Author(s) Moger, TA; Haugen, M; Yip, BHK; Gjessing, HK; Borgan, Ø Citation Lifetime Data Analysis, 2010, v. 17, n. 3, p. 445-460

More information

Likelihood and p-value functions in the composite likelihood context

Likelihood and p-value functions in the composite likelihood context Likelihood and p-value functions in the composite likelihood context D.A.S. Fraser and N. Reid Department of Statistical Sciences University of Toronto November 19, 2016 Abstract The need for combining

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests

Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests Noryanti Muhammad, Universiti Malaysia Pahang, Malaysia, noryanti@ump.edu.my Tahani Coolen-Maturi, Durham

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

SEMIPARAMETRIC REGRESSION WITH TIME-DEPENDENT COEFFICIENTS FOR FAILURE TIME DATA ANALYSIS

SEMIPARAMETRIC REGRESSION WITH TIME-DEPENDENT COEFFICIENTS FOR FAILURE TIME DATA ANALYSIS Statistica Sinica 2 (21), 853-869 SEMIPARAMETRIC REGRESSION WITH TIME-DEPENDENT COEFFICIENTS FOR FAILURE TIME DATA ANALYSIS Zhangsheng Yu and Xihong Lin Indiana University and Harvard School of Public

More information

Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

More information

Philosophy and Features of the mstate package

Philosophy and Features of the mstate package Introduction Mathematical theory Practice Discussion Philosophy and Features of the mstate package Liesbeth de Wreede, Hein Putter Department of Medical Statistics and Bioinformatics Leiden University

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

Imputation Algorithm Using Copulas

Imputation Algorithm Using Copulas Metodološki zvezki, Vol. 3, No. 1, 2006, 109-120 Imputation Algorithm Using Copulas Ene Käärik 1 Abstract In this paper the author demonstrates how the copulas approach can be used to find algorithms for

More information

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula

More information

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Statistica Sinica 23 (2013), 383-408 doi:http://dx.doi.org/10.5705/ss.2011.151 A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Chi-Chung Wen and Yi-Hau Chen

More information

FAILURE-TIME WITH DELAYED ONSET

FAILURE-TIME WITH DELAYED ONSET REVSTAT Statistical Journal Volume 13 Number 3 November 2015 227 231 FAILURE-TIME WITH DELAYED ONSET Authors: Man Yu Wong Department of Mathematics Hong Kong University of Science and Technology Hong Kong

More information

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016 Statistics 255 - Survival Analysis Presented March 8, 2016 Dan Gillen Department of Statistics University of California, Irvine 12.1 Examples Clustered or correlated survival times Disease onset in family

More information

A Regression Model for the Copula Graphic Estimator

A Regression Model for the Copula Graphic Estimator Discussion Papers in Economics Discussion Paper No. 11/04 A Regression Model for the Copula Graphic Estimator S.M.S. Lo and R.A. Wilke April 2011 2011 DP 11/04 A Regression Model for the Copula Graphic

More information

A multi-state model for the prognosis of non-mild acute pancreatitis

A multi-state model for the prognosis of non-mild acute pancreatitis A multi-state model for the prognosis of non-mild acute pancreatitis Lore Zumeta Olaskoaga 1, Felix Zubia Olaskoaga 2, Guadalupe Gómez Melis 1 1 Universitat Politècnica de Catalunya 2 Intensive Care Unit,

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data

Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data Efficiency Comparison Between Mean and Log-rank Tests for Recurrent Event Time Data Wenbin Lu Department of Statistics, North Carolina State University, Raleigh, NC 27695 Email: lu@stat.ncsu.edu Summary.

More information

Time-varying proportional odds model for mega-analysis of clustered event times

Time-varying proportional odds model for mega-analysis of clustered event times Biostatistics (2017) 00, 00, pp. 1 18 doi:10.1093/biostatistics/kxx065 Time-varying proportional odds model for mega-analysis of clustered event times TANYA P. GARCIA Texas A&M University, Department of

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Exercises. (a) Prove that m(t) =

Exercises. (a) Prove that m(t) = Exercises 1. Lack of memory. Verify that the exponential distribution has the lack of memory property, that is, if T is exponentially distributed with parameter λ > then so is T t given that T > t for

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Two-level lognormal frailty model and competing risks model with missing cause of failure

Two-level lognormal frailty model and competing risks model with missing cause of failure University of Iowa Iowa Research Online Theses and Dissertations Spring 2012 Two-level lognormal frailty model and competing risks model with missing cause of failure Xiongwen Tang University of Iowa Copyright

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What? You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros Web-based Supplementary Material for A Two-Part Joint Model for the Analysis of Survival and Longitudinal Binary Data with excess Zeros Dimitris Rizopoulos, 1 Geert Verbeke, 1 Emmanuel Lesaffre 1 and Yves

More information

A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes

A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes Ritesh Ramchandani Harvard School of Public Health August 5, 2014 Ritesh Ramchandani (HSPH) Global Rank Test for Multiple Outcomes

More information