Analysis of recurrent event data under the case-crossover design. with applications to elderly falls

Size: px
Start display at page:

Download "Analysis of recurrent event data under the case-crossover design. with applications to elderly falls"

Transcription

1 STATISTICS IN MEDICINE Statist. Med. 2007; 00:1 22 [Version: 2002/09/18 v1.11] Analysis of recurrent event data under the case-crossover design with applications to elderly falls Xianghua Luo 1,, and Gary S. Sorock 2 1 Division of Biostatistics, School of Public Health, University of Minnesota, 420 Delaware Street SE, Minneapolis, MN 55455, U.S.A. 2 Geriatric Research Services, 312 Central Ave. Box 280, Glyndon, MD 21071, U.S.A. SUMMARY The case-crossover design is useful for studying the effects of transient exposures on short-term risk of diseases or injuries when only data on cases are available. The crossover nature of this design allows each subject to serve as his own control. While the original design was proposed for univariate event data, in many applications recurrent events are often encountered (e.g., elderly falls, gout attacks, and sexually transmitted infections). In such situations, the within-subject dependence among recurrent events needs to be taken into account in the analysis. We review three existing conditional logistic regression-based approaches for analyzing recurrent event data under the case-crossover design. A simple approach is to use only one (e.g. the first) event for each subject, such that no assumption on the correlation among multiple events is needed, while we would expect loss of efficiency in estimation. The validity of the other two reviewed approaches rely on independence assumptions for the recurrent Correspondence to: Division of Biostatistics, School of Public Health, University of Minnesota, 420 Delaware Street SE, Minneapolis, MN 55455, U.S.A. luox0054@umn.edu. Telephone: (612) Fax: (612) Copyright c 2007 John Wiley & Sons, Ltd.

2 2 X. LUO AND G. S. SOROCK events, conditionally on a subject-level latent variable and a set of observed time-varying covariates. In this paper, we propose to adjust the conditional logistic regression using either a within-subject pairwise resampling technique or a weighted estimating equation. No specific dependency structure among the recurrent events is needed for these two methods. We also propose a weighted Mantel- Haenszel estimator for situations with a binary exposure. Simulation studies are conducted to evaluate the performance of the discussed methods. We present the analysis of a study of the effect of medication changes on falls among the elderly. Copyright c 2007 John Wiley & Sons, Ltd. key words: case-crossover; conditional logistic regression; Mantel-Haenszel; recurrent events; weighted estimating equation; within-cluster resampling 1. INTRODUCTION The case-crossover design was introduced by Maclure [1] as an analogue to the matched casecontrol design for studying the effects of transient exposures on the risk of acute-onset diseases or injuries. Only data on people who have the disease or injury are required (cases only). The association between disease onset and risk factors is estimated by comparing the exposure of risk factors prior to the disease onset (case time) with that in a reference time (control time) or multiple reference times [2]. The case and control times have a prefixed relation to the onset time of the disease. For example, the association between fall risk and medication change can be estimated by comparing the medication change record during 1-2 days before (case time) versus 8-9 days before (control time) each fall date. By using the person as his own matched control, potential between-subject confounding variables are readily controlled. As a consequence, subject-level covariates effects are not estimable. A case-crossover study is

3 ANALYSIS OF RECURRENT EVENT DATA UNDER THE CASE-CROSSOVER DESIGN 3 intended to identify the short-term, transicent triggers of disease, rather than identifying who is at highest risk of disease. As for a matched case-control study, standard methods such as Mantel-Haenszel (MH) method and conditional logistic regression (CLR) [3] can be used for analyzing case-crossover data, as suggested by Maclure [1] and Mittleman et al [2]. Greenland [4] and Navidi [5] pointed out that the case-crossover design is subject to bias from time trends in exposures. Vines and Farrington [6] proved that a sufficient condition is needed to avoid this bias, namely the global exchangeability of exposure distribution. If M control times are matched to each case time, this condition requires the distribution of exposures in the M + 1 consecutive time periods is exchangeable. Recurrent events are frequently encountered in case-crossover studies, examples include studies of falls [7], gout attacks [8], and sexually transmitted infections [9]. Maclure [1] was aware of the threat of inappropriate statistical analyses for repeated events to the validity of the design, but no specific methods were suggested. The present paper concerns how recurrent events in case-crossover studies can be analyzed by accounting for the within-subject correlation among recurrent events. A simple approach is to use only one (e.g. the first) event for each person, such that no assumption on the correlation among multiple events is needed. Another simple approach is to assume that within-subject correlation is completely accounted for by subject-specific variables (observed or unobserved), hence the recurrent events are independent conditionally on these subject-specific variables and other observed time-varying covariates. Under this assumption, the usual conditional likelihood method can be implemented on the pooled events data without modification. It should be noted that the unit of analysis in the conditional likelihood of this approach is the

4 4 X. LUO AND G. S. SOROCK event rather than person. Navidi [5] proposed the full-stratum design, which deviates from the original form of the case-crossover design by requiring that control information being collected for all possible time intervals. In that sense, the sampling mechanism of the fullstratum design is more similar to cohort rather than case-control studies. Navidi s conditional logistic regression method for recurrent events requires the same conditional independence assumption, but treats the recurrent events within each person as a cluster and constructs the conditional likelihood on a person/cluster level. Though this method was proposed for the full-stratum design, it can be used for analyzing recurrent events under the case-crossover design. This is because the same form of conditional likelihood can be obtained whether the data are regarded as from a cohort or a case-control study (Reference [3], p248). In Section 2 of this paper, we describe a case-crossover study of the transient effects of medication changes on recurrent falls conducted in three nursing homes. Next, in Section 3, we introduce notations and summarize the above existing methods. Then, we propose two other approaches for analyzing recurrent events. The first one adopts the within-cluster resampling technique, proposed by Hoffman, Sen, and Weinberg [10] and Follmann, Proschan, and Leifer [11], while our resampling unit is matched pairs or sets (of observations in case and control periods) rather than individual observations. As pointed out in Reference [10], the withincluster resampling-based methods are robust in the situations where the cluster size is nonignorable. For example, the number of recurrent events per subject is informative to the effect of exposures on the risk of disease. The second proposed method is based on a weighted estimating equation, which was introduced by Williamson, Datta, and Satten [12], and the similar idea has been widely used for different types of correlated data. Connection between within-cluster resampling and weighted estimating equation approaches was also made in Reference [12]. Pros

5 ANALYSIS OF RECURRENT EVENT DATA UNDER THE CASE-CROSSOVER DESIGN 5 and cons of the above methods are discussed in Section 3 and then compared in a simulation study in Section 4. We present the applications of the above methods to the elderly falls data in Section 5. Some discussion is in Section ELDERLY FALLS DATA Falls in the elderly often occur recurrently. When the research interest is to determine the risk factors for falling rather than for being a faller, the case-crossover design can be useful. In this paper, we consider a data set with a total of 158 nursing home residents who were at least 65 years old and fell at least once at one of three study sites (the Johns Hopkins Bayview Care Center in Baltimore, MD and the V.A. nursing homes in Tampa and Orlando, FL) during The selected residents were on average 81 years old, mainly white (68%), male (66%), and had a diagnosis of dementia (51%). There were 419 falls in total or 2.7 falls per person observed among the 158 residents, the number of falls per resident ranged from 1 to 37 (median 1). Data were collected from medical records on medication changes, including medication name, date of change, and type of change (i.e. new start, dose change, an as-needed dose given, or discontinuation), over a nine-day period prior to each fall. These data were originally presented in Sorock et al. (submitted manuscript). The main research question in this study was to estimate the association between fall risk and medication change by comparing the medication change record during 1-2 days before versus 8-9 days before each fall date. A one-week lapse between case and control periods was chosen to minimize any changes of potentially unobserved time-varying confounding variables, e.g. health status.

6 6 X. LUO AND G. S. SOROCK 3. METHODOLOGY 3.1. Notation Consider a sample of subjects, i = 1,, n, who have at least one event during a fixed time interval, where time is assumed to be discrete, i.e. t {τ 1, τ 2,, τ T }. For ease of presentation, we consider the case-crossover design with one control time matched for each event time, analogous to one-to-one matched case-control studies. However, the discussed methods can be easily extended to deal with the setting with more than one control (1:M matching) and/or more than one event in each matched set. For the ith subject, m i (m i 1) events occur at t i1,, t imi and for each event time t ij, a control time s ij is chosen, j = 1,, m i, i = 1,, n. Let X 1ij denote the covariate of subject i at the event time t ij and X 0ij be this subject s covariate at the corresponding control time s ij. Let O ij = {t ij, X 1ij, s ij, X 0ij } denote the matched set for the jth event of the ith subject. We assume that the log odds of event for subject i at time t are given by log p it 1 p it = λ i + βx it, (1) where X it represents the time-varying covariate of subject i at time t and β is the difference in log odds of event associated with one unit increases in X. The subject-specific λ i is timeinvarying and could be a linear combination of a set of baseline covariates, including observed or unobserved variables Single event analysis approach To avoid modelling the dependence among multiple matched sets from the same subject, one simple approach is to use only one event (e.g. first event) and its matched control per subject, as

7 ANALYSIS OF RECURRENT EVENT DATA UNDER THE CASE-CROSSOVER DESIGN 7 mentioned in Reference [7]. Hence, a sample of n independent matched sets, {O i1 ; i = 1,, n}, can be constructed and then the conditional logistic regression can be used for the estimation of β. The conditional likelihood (Reference [13], p190) of the exposure status at the case time given it being one of the two observed exposures in the matched set is n i=1 and the conditional likelihood score function is S 1 (β) = exp(βx 1i1 ) exp(βx 1i1 ) + exp(βx 0i1 ), (2) { n } 1 l=0 X 1i1 X li1 exp(βx li1 ) 1 l=0 exp(βx. (3) li1) i=1 A sufficient condition for the validity of this conditional likelihood is that the distribution of X in any two successive time periods are exchangeable within a subject [6]. The conditional maximum likelihood estimate (CMLE), ˆβ solves S1 (β) = 0 and is asymptotically normal with mean equal to the true parameter and variance (or variance-covariance matrix) equal to the negative inverse of S 1 (β)/ β [14]. The variance can be estimated by replacing the true parameter with the estimated parameter. Although the estimate of β from this approach is valid, we would expect loss of efficiency since this method ignores the recurrent nature of the data and truncates all second and later events from the analysis Analysis of pooled recurrent events Another frequently used CLR-based approach is to assume that the occurrence of events at distinct times are independent given the subject-specific effect λ i and the time-varying covariate X it, so that the pooled matched sets from different subjects, {O ij ; i = 1,, n, j = 1,, m i } are conditionally independent and identically distributed (iid). The conditional

8 8 X. LUO AND G. S. SOROCK likelihood is, hence, n m i i=1 j=1 exp(βx 1ij ) exp(βx 1ij ) + exp(βx 0ij ), (4) and the conditional likelihood score function is { n m i } 1 l=0 X 1ij X lij exp(βx lij ) 1 l=0 exp(βx. (5) lij) i=1 j=1 Applications of this approach can be found in Reference [7] when falls are the unit of analysis rather than persons Analysis of clustered data Navidi [5] proposed the full-stratum design, which requires that control information being collected for all possible time intervals. As pointed out by Whitaker et al [15], the full-stratum design is based on cohort sampling, while the original case-crossover design is based on casecontrol sampling. It was discussed in Reference [5] that the conditional logistic regression can be used not only for clusters with a single case, but also for clusters with multiple cases. The proposed method for analyzing recurrent events in that paper treats the recurrent events within each person as a cluster and constructs the conditional likelihood on a person/cluster level. Though this method was proposed for the full-stratum design, it can be used for analyzing recurrent events under the case-crossover design. We can argue that the same form of conditional likelihood can be obtained whether the data are regarded as from a cohort or a case-control study (Reference [3]). For subject i, let the set of covariates observed at the m i event times and the m i control times be E i = {X 1ij, X 0ij ; j = 1,, m i }. The set E i can be treated as a cluster. The conditional probability that events covariate values are precisely in {X 1ij = x 1ij ; j = 1,, m i }, given that they lie in E i, is exp ( βσ mi j=1 X 1ij) /ΣSi exp(βσ l Si X l ), where S i is a subset of E i of size

9 ANALYSIS OF RECURRENT EVENT DATA UNDER THE CASE-CROSSOVER DESIGN 9 m i and S i is the summation over all such subsets. A sufficient condition for the validity of this probability requires the joint distribution of covariates in any 2m i successive time periods being exchangeable within subject i [6]. Note that the exchangeability condition for the clustered data analysis method is stronger than that for the single event analysis or the pooled data analysis methods because 2m i could be bigger than 2. The conditional likelihood is ( n exp β ) m i j=1 X 1ij S i exp(β l S i X l ), (6) i=1 and the conditional likelihood score function is n m i ) ( )} S X 1ij i {( l S i X l exp β l S i X l S i exp(β. (7) l S i X l ) i=1 j=1 This method requires the same conditional iid assumption as the pooled data analysis method described in Section 3.3. The difference is that the conditional likelihood in the pooled analysis is based on events rather persons, while the method for clustered data treats the recurrent events within a person as a cluster and constructs the conditional likelihood on a person/cluster level. An advantage of this property is that events failed to be matched with reference time(s) by design do not need to be excluded from the clustered data analysis method. However, as the number of events per subject grows large, this method can become computationally infeasible. Another disadvantage of this method is that the natural bound between the case and control times within each matched set is broken, hence can be vulnerable to biases caused by potentially unobserved time-varying confounding variables Within-subject pairwise resampling Hoffman, Sen, and Weinberg [10] proposed a within-cluster resampling (WCR) procedure for clustered data, where cluster size could be nonignorable. Within the framework of WCR,

10 10 X. LUO AND G. S. SOROCK an observation is randomly sampled from each cluster with replacement. The resulting (sub)sample consists of independent data and then can be analyzed using existing univariate methods. By repeating the resampling a large number of times, the parameter can be estimated by averaging the estimates from each univariate analysis. The variance estimate and weak consistency are provided in that paper. Follmann, Proschan, and Leifer [11] discussed how to choose the number of resampling and proposed an estimator, referred as exhaustive multiple outputation or EMO, obtained by finding all possible subsamples and averaging the estimates from all these subsamples. Rieger and Weinberg [16] adopted the within-cluster resmapling method in the framework of CLR for clustered binary outcome data. In their method, a resampled data set is constructed by randomly selecting one case and one control from each cluster, so that the resulting data set consists of pairs of observations. The essence of this method is to transform clustered data into multiple 1:1 matched data sets and then conduct the matched-pair case-control data analysis repeatedly. The advantage is that the withincluster conditional independence assumption can be dropped. In the case-crossover data that we described in Section 3.1, the case period and control period are matched pairwisely by design, so that the resampling can be based on matched sets rather than on individual observations as in Reference [16]. The matched sets-based resampling should be more robust in situations with unmeasured time-varying confounding variables. The rest of the estimation procedure is the same as other typical WCR methods. The whole procedure is outlined as follows. Step 1. Sample one matched pair, O ij, randomly from each subject. The resulting data set consists of n independent pairs of case and control periods. Step 2. Conduct conditional logistic regression on the resampled data set and record the

11 ANALYSIS OF RECURRENT EVENT DATA UNDER THE CASE-CROSSOVER DESIGN 11 parameter estimate and its variance (or variance-covariance matrix) estimate as ˆβ(q) and ˆΣ(q) (estimation method is the same as the one described in Section 3.2). The same exchangeability assumption on the exposure distribution as for the single event analysis method is needed. Step 3. Repeat steps 1 and 2 for a large number of times, say Q. Step 4. The formulas to calculate the parameter estimate and its variance estimate were provided in Reference [10] and are repeated as follows: ˆβ WCR = 1 Q ˆβ(q), (8) Q q=1 { } 1 Q ˆΣ WCR = ˆΣ(q) 1 Q ( Q Q ˆβ(q) ˆβ WCR )( ˆβ(q) ˆβ WCR ) T. (9) q=1 For the EMO estimate, Q = n i=1 m i, the total number of all possible resamples. q=1 The proposed within-subject pairwise resampling or WSPR method inherits the advantages of a general WCR method, namely that (1) the correlation structure among multiple matched sets within each subject is left unspecified, and (2) the number of matched sets or events per subject can be nonignorable [10] Weighted estimating equation Considering the intensiveness of computing for the WCR method, Williamson, Datta, and Satten [12] proposed a weighted estimating equation (WEE) method for clustered data. In that paper, it was observed that in each round of resampling in the WCR procedure, ˆβ(q) solves the score equation S (q) (β) = 0, where n m i S (q) (β) = U ij (β)i[(i, j) r q ], (10) i=1 j=1

12 12 X. LUO AND G. S. SOROCK with r q being the set of indices (i, j) that are sampled in the qth resampling. For our casecrossover data, U ij (β) = X 1ij 1 l=0 X lij exp(βx lij )/ 1 l=0 exp(βx lij). They, further, argued that the resampling distribution is a discrete uniform distribution with a probability mass of 1/m i on each observation within subject i, hence, the indicator I[(i, j) r q ] in (10) can be replace by 1/m i. The resulting WEE for our case-crossover data is then { n 1 m i } 1 l=0 S W EE (β) = X 1ij X lij exp(βx lij ) m 1 i l=0 exp(βx. (11) lij) i=1 j=1 It was also proved in Reference [12] that EMO estimator and WEE estimator are asymptotically equivalent, and the WEE estimator has weak convergence, namely that, n( ˆβ β) converges to a normal distribution with mean 0 and variance that can be estimated by a sandwich form ˆν = Ĥ 1 ˆV Ĥ 1, where Ĥ = n 1 n 1 m i m i=1 i j=1 U ij (β) β β= ˆβ and ˆV = n 1 n i=1 1 m i m i j=1 U ij ( ˆβ) 1 m i m i j=1 U ij ( ˆβ) Applications of the WEE method can also be found in the analysis of clustered survival data [17]. T. 3.5 Mantel-Haenszel estimator revisited It is noteworthy to mention that in applications, if only one dichotomous variable, e.g., exposure or non-exposure status, is of interest, the Mantel-Haenszel method provide an easyto-implement alternative to the conditional likelihood based methods. Under the conditional

13 ANALYSIS OF RECURRENT EVENT DATA UNDER THE CASE-CROSSOVER DESIGN 13 iid assumption, the Mantel-Haenszel estimator of the odds ratio for 1 : M matched data is M m=1 (M m + 1)n 1,m 1 M m=1 mn, (12) 0,m where n 1,m and n 0,m are the total number of matched sets with, respectively, the case exposed and the case not exposed, and exactly m controls also exposed. When M = 1 (1 : 1 matching), it is simply the ratio of the discordant pair, n 10 /n 01 of a two-by-two table. We note that the Mantel-Haenszel estimator is a special case of the pooled data analysis method that we described in Section 3.3 when only one binary covariate is present in the model. Analogous to the WEE method, we can also propose a weighted Mantel-Haenszel (WMH) estimator for odds ratio, namely M m=1 (M m + 1)ñ 1,m 1 M m=1 mñ, (13) 0,m where ñ i,m and ñ 0,m have similar definitions as n 1,m and n 0,m, but with each matched set O ij being counted as 1/m i instead of 1. When M = 1, the WMH estimator also has a simple form, ñ 10 /ñ 01. It can be proved that the WMH estimator is the solution of the WEE when only one binary covariate is present in the model. 4. A SIMULATION STUDY We conduct a simulation study to asses the performance of the existing and proposed methods. The simulated data has a similar structure as the elderly falls data that we describe in Section 2. For each simulation study, 1000 replicate data sets were generated, with a sample size n = 200. For the WSPR method, each data set is resampled Q = 1000 times. A fixed time period with discrete time points 1, 2,, T (T = 100) is assumed for each subject. We genrate the data under the model logit(p it ) = λ i + βx it, where the subject-specific baseline risk λ i

14 14 X. LUO AND G. S. SOROCK is a logged Beta (a = 1, b = 100) distributed variable, and X it is generated from Bernoulli distribution with probability of 1/20. Table I displays the mean of the estimated regression coefficient, the sample standard deviation over the 1000 simulations, and the mean of the estimated standard error, for five different methods. The regression coefficient is chosen β = 1, 2, or 3, the corresponding average number of events per subject is m i = 2.027, 2.211, 2.570, respectively. The results show that all methods yield reasonably unbiased estimates of the regression coefficient. The estimated regression coefficient from the method which uses only the first event has the biggest variation, while that from the method treating all events and controls from the same subject as a cluster (clustered data analysis method) has the least variation. However, when the number of events per cluster increases, the clustered data analysis method requires longer computing time. The second smallest variation of estimation is from the pooled data analysis method, which is comparable to the clustered data analysis method. The WSPR and WEE methods produce similar variation of estimation that is about midway among all methods. However, as we discussed earlier, these two methods can accommodate flexibility in correlation structure in recurrent events. The WEE method is more efficient, in terms of computing time, than the WSPR method especially when the number of resampling is large. In summary, when the conditional iid assumption holds, the pooled data analysis approach is the most efficient method, taken into account both efficiency of the estimator and the computing time. We also recommend using the WEE method when the dependence structure of the recurrent events is unclear.

15 ANALYSIS OF RECURRENT EVENT DATA UNDER THE CASE-CROSSOVER DESIGN APPLICATION TO THE ELDERLY FALLS DATA We illustrate the application of the existing and proposed methods in the elderly falls data discussed in Section 2. All the previously discussed methods, except for the method treating the data as clustered data, require at least one control period matched to each case period prior to a fall. For the elderly falls study, a nine day window before a fall is used to define the case (1-2 days before the fall date) and the control (8-9 days before the fall date) periods. Therefore, some falls have to be excluded due to the difficulties in defining controls described below. First, falls that occur within 9 days from the date of admission are excluded from the analysis due to lacking of information on exposure status. Second, if consecutive falls are separated by less than 9 days, only the first one (called primary fall ) is included in the analysis because the control period of later falls (called secondary falls ) could have overlap with preceding falls case period. As a consequence, 311 out of 419 falls are left for analysis and no statistically significant difference between the remaining 148 residents and the 10 excluded residents is observed, in terms of age, gender, race, and dementia status. If we use the method which treats the data as clustered data as discussed in Section 3.4, it is not necessary to find a control period for each individual fall. Hence, the falls, being failed to be matched to controls can still be included in the analysis. However, we still need a two-day window before the fall date to define the case period. Therefore, we exclude the falls that occur within 2 days from the date of admission and the secondary falls that are separated by less than 2 days from proceeding falls. This leaves us 376 falls (150 residents) for analysis. Again, no significant difference between the included and excluded residents is found in terms of patient characteristics.

16 16 X. LUO AND G. S. SOROCK Two types of exposure are of interest: changes in Central Nervous System (CNS) medication use and changes in non-cns medication use. We summarize both the estimated odds ratio and the 95% confidence intervals in Table II. All methods find that significantly elevated fall risk follows CNS medication changes and no significant effect of non-cns medication changes on fall risk is observed. From the method which uses only the first fall for each resident, we can see that the odds ratio estimate for CNS changes is higher than the other methods which use recurrent falls, while there is loss of efficiency, i.e. wider 95% confidence interval, compared with other methods. The relatively large odds ratio estimate could be due to the bigger variation of the estimate. The limitation of the application of the clustered data analysis method on this data set is that the computing time is not affordable because a certain number of residents fell more than 10 times (up to 31) during the study period and the computing time is exponential in the number of falls per resident. We choose to use the Breslow [18] s and Efron [19] s methods to approximate the exact conditional likelihood in estimation. Both methods yield odds ratio estimates for CNS changes that are substantially smaller in magnitude than the other methods. Considering the fact that approximation methods are used in estimation, the estimates from this analysis are not considered reliable. The pooled data analysis method and the proposed WSPR method and WEE method produce consistent results in terms of both point and interval estimates for both types of medication changes. Based on the estimates from these three methods, the change in CNS medication use resulted in a three to four folds increase in the fall risk among the studied nursing home residents. The change in non-cns medication use did not significantly change their fall risk.

17 ANALYSIS OF RECURRENT EVENT DATA UNDER THE CASE-CROSSOVER DESIGN DISCUSSION In this article, we consider different methods for estimating the effect of exposures on risk of disease for recurrent event data under the case-crossover design. Commonly used existing methods are reviewed and two new methods are proposed (WSPR and WEE methods), which enrich the available tools for analyzing recurrent event data under the case-crossover design. It is also discussed in this paper that the two proposed methods have more flexibility than the existing methods in the situations with unknown correlation structures among recurrent events. The Mantel-Haenszel estimator is revisited and a weighted Mantel-Haenszel (WMH) estimator is proposed. Both estimators are easy to compute and implement in applications. In all the previously discussed conditional likelihood-based approaches, subject-level covariates, observed or unobserved, are readily controlled. However, none of these methods can deal with subject-specific slopes in the model, for example, log p it 1 p it = λ i + (β + α i )X it, where α i is the subject-specific effect of exposures on event risks (or slope). Future research can proceed in this direction. In application, we can still use the previously discussed models, but including interactions of subject-level covariate (e.g. race, gender, and different diagnosis of disease) with the time-varying exposure status to capture the subject-specific exposure effect as much as possible.

18 18 X. LUO AND G. S. SOROCK ACKNOWLEDGEMENTS Dr. Gary Sorock s research is supported by Centers for Disease Control, National Center for Injury Prevention and Control, grant # H This material is also the result of work supported with resources and the use of facilities at the James A. Haley Veterans Hospital. The authors thank Dr. Chiung-Yu Huang for her careful reading of the manuscript and insightful comments and valuable suggestions.

19 ANALYSIS OF RECURRENT EVENT DATA UNDER THE CASE-CROSSOVER DESIGN 19 REFERENCES 1. Maclure M. The case-crossover design: a method for studying transient effects on the risk of acute events. American Journal of Epidemiology 1991; 133: Mittleman MA, Maclure M, Robins JM. Control sampling strategies for case-crossover studies: an assessment of relative efficiency. American Journal of Epidemiology 1995; 142: Breslow NE, Day NE. Statistical Methods in Cancer Research. Volumn 1. The Analysis of Case-Control Studies. International Agency for Research on Cancer (IARC scientific publications no. 32): Lyon, Greenland S. Confounding and exposure trends in case-crossover and case-time-control designs. Epidemiology 1996; 7: Navidi W. Bidirectional case-crossover designs for exposures with time trends. Biometrics 1998; 54: Vines SK, Farrington CP. Within-subject exposure dependency in case-crossover studies. Statistics in Medicine 2001; 20: Neutel CI, Perry S, Maxwell C. Medication use and risk of falls. Pharmaco-epidemiology and Drug Safety 2002; 11: Zhang Y, Woods R, Chaisson CE, Neogi T, Niu J, McAlindon TE, Hunter D. Alcohol consumption as a trigger of recurrent gout attacks. The American Journal of Medicine 2006; 119:800.e Warner L, Macaluso M, Austin HD, Kleinbaum DK, Artz L, Fleenor ME, Brill I, Newman DR, Hook EW III. Application of the case-crossover design to reduce unmeasured confounding in studies of condom effectiveness. American Journal of Epidemiology 2005; 161: Hoffman EB, Sen PK, Weinberg CR. Within-cluster resampling. Biometrika 2001; 88: Follmann D, Proschan M, Leifer E. Multiple outputation: inference for complex clustered data by averaging anlalyses from independent data. Biometrics 2003: 59: Williamson JM, Datta S, Satten GA. Marginal analyses of clustered data when cluster size is informative. Biometrics 2003; 59: Hosmer DW, Lemeshow S. Applied Logistic Regression. Wiley: New York, Agresti A. Categorical Data Analysis (2nd edn). John Wiley& Sons: New York, Whitaker HJ, Hocine MN, Farrington CP. On case-crossover methods for environmental time series data. Environmentrics 2007; 18: Rieger RH, Weinberg CR. Analysis of clustered binary outcomes using within-cluster paired resampling. Biometrics 2002; 58:

20 20 X. LUO AND G. S. SOROCK 17. Lu W. Marginal regression of multivariate event times based on linear transformation models. Lifetime Data Analysis 2005; 11: Breslow NE. Covariance analysis of censored survival data. Biometrics 1974; 30: Efron B. The efficiency of Cox s likelihood function for censored data. Journal of American Statistical Association 1977; 76:

21 ANALYSIS OF RECURRENT EVENT DATA UNDER THE CASE-CROSSOVER DESIGN 21 Table I. Mean of estimated regression coefficients, sample standard deviation of estimated regression coefficients, and mean of estimated standard errors, for five different methods. β = 1 β = 2 β = 3 Method Est SD SE Est SD SE Est SD SE First event method Pooled data method Clustered data method WSPR method WEE method Est = mean of the estimated regression coefficients. SD = sample standard deviation of the estimated regression coefficients. SE = mean of the estimated standard errors.

22 22 X. LUO AND G. S. SOROCK Table II. The estimated odds ratio and 95% confidence interval for the elderly falls data based on different methods. CNS medication change Non-CNS medication change First event method 5.50 ( ) 1.00 ( ) Pooled data method 4.00 ( ) 1.19 ( ) Clustered data method Breslow s approximation 1.86 ( ) 1.13 ( ) Efron s approximation 2.35 ( ) 1.21 ( ) WSPR method 3.61 ( ) 1.00 ( ) WEE method 3.51 ( ) 1.00 ( )

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

On case-crossover methods for environmental time series data

On case-crossover methods for environmental time series data On case-crossover methods for environmental time series data Heather J. Whitaker 1, Mounia N. Hocine 1,2 and C. Paddy Farrington 1 * 1 Department of Statistics, The Open University, Milton Keynes, UK.

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

Estimating direct effects in cohort and case-control studies

Estimating direct effects in cohort and case-control studies Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Asymptotic equivalence of paired Hotelling test and conditional logistic regression Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS

More information

Describing Stratified Multiple Responses for Sparse Data

Describing Stratified Multiple Responses for Sparse Data Describing Stratified Multiple Responses for Sparse Data Ivy Liu School of Mathematical and Computing Sciences Victoria University Wellington, New Zealand June 28, 2004 SUMMARY Surveys often contain qualitative

More information

More Statistics tutorial at Logistic Regression and the new:

More Statistics tutorial at  Logistic Regression and the new: Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual

More information

Analysis of recurrent gap time data using the weighted risk-set. method and the modified within-cluster resampling method

Analysis of recurrent gap time data using the weighted risk-set. method and the modified within-cluster resampling method STATISTICS IN MEDICINE Statist. Med. 29; :1 27 [Version: 22/9/18 v1.11] Analysis of recurrent gap time data using the weighted risk-set method and the modified within-cluster resampling method Xianghua

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Quantile Regression for Recurrent Gap Time Data

Quantile Regression for Recurrent Gap Time Data Biometrics 000, 1 21 DOI: 000 000 0000 Quantile Regression for Recurrent Gap Time Data Xianghua Luo 1,, Chiung-Yu Huang 2, and Lan Wang 3 1 Division of Biostatistics, School of Public Health, University

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest.

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest. Conditional and Unconditional Categorical Regression Models with Missing Covariates Glen A. Satten and Raymond J. Carroll Λ December 4, 1999 Abstract We consider methods for analyzing categorical regression

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

One-stage dose-response meta-analysis

One-stage dose-response meta-analysis One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and

More information

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Missing covariate data in matched case-control studies: Do the usual paradigms apply? Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

More information

Unbiased estimation of exposure odds ratios in complete records logistic regression

Unbiased estimation of exposure odds ratios in complete records logistic regression Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Methodology Marginal versus conditional causal effects Kazem Mohammad 1, Seyed Saeed Hashemi-Nazari 2, Nasrin Mansournia 3, Mohammad Ali Mansournia 1* 1 Department

More information

Truncated logistic regression for matched case-control studies using data from vision screening for school children.

Truncated logistic regression for matched case-control studies using data from vision screening for school children. Biomedical Research 2017; 28 (15): 6808-6812 ISSN 0970-938X www.biomedres.info Truncated logistic regression for matched case-control studies using data from vision screening for school children. Ertugrul

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X Chapter 157 Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X Introduction This procedure calculates the power and sample size necessary in a matched case-control study designed

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs The data for the tutorial came from support.sas.com, The LOGISTIC Procedure: Conditional Logistic Regression for Matched Pairs Data :: SAS/STAT(R)

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

ABSTRACT INTRODUCTION. SESUG Paper

ABSTRACT INTRODUCTION. SESUG Paper SESUG Paper 140-2017 Backward Variable Selection for Logistic Regression Based on Percentage Change in Odds Ratio Evan Kwiatkowski, University of North Carolina at Chapel Hill; Hannah Crooke, PAREXEL International

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Stephen Burgess Department of Public Health & Primary Care, University of Cambridge September 6, 014 Short title:

More information

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative

More information

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350 bsa347 Logistic Regression Logistic regression is a method for predicting the outcomes of either-or trials. Either-or trials occur frequently in research. A person responds appropriately to a drug or does

More information

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Integrated approaches for analysis of cluster randomised trials

Integrated approaches for analysis of cluster randomised trials Integrated approaches for analysis of cluster randomised trials Invited Session 4.1 - Recent developments in CRTs Joint work with L. Turner, F. Li, J. Gallis and D. Murray Mélanie PRAGUE - SCT 2017 - Liverpool

More information

Fitting stratified proportional odds models by amalgamating conditional likelihoods

Fitting stratified proportional odds models by amalgamating conditional likelihoods STATISTICS IN MEDICINE Statist. Med. 2008; 27:4950 4971 Published online 10 July 2008 in Wiley InterScience (www.interscience.wiley.com).3325 Fitting stratified proportional odds models by amalgamating

More information

DATA-ADAPTIVE VARIABLE SELECTION FOR

DATA-ADAPTIVE VARIABLE SELECTION FOR DATA-ADAPTIVE VARIABLE SELECTION FOR CAUSAL INFERENCE Group Health Research Institute Department of Biostatistics, University of Washington shortreed.s@ghc.org joint work with Ashkan Ertefaie Department

More information

APPENDIX B Sample-Size Calculation Methods: Classical Design

APPENDIX B Sample-Size Calculation Methods: Classical Design APPENDIX B Sample-Size Calculation Methods: Classical Design One/Paired - Sample Hypothesis Test for the Mean Sign test for median difference for a paired sample Wilcoxon signed - rank test for one or

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

More information

Effect Modification and Interaction

Effect Modification and Interaction By Sander Greenland Keywords: antagonism, causal coaction, effect-measure modification, effect modification, heterogeneity of effect, interaction, synergism Abstract: This article discusses definitions

More information

Robust Bayesian Variable Selection for Modeling Mean Medical Costs

Robust Bayesian Variable Selection for Modeling Mean Medical Costs Robust Bayesian Variable Selection for Modeling Mean Medical Costs Grace Yoon 1,, Wenxin Jiang 2, Lei Liu 3 and Ya-Chen T. Shih 4 1 Department of Statistics, Texas A&M University 2 Department of Statistics,

More information

Confounding, mediation and colliding

Confounding, mediation and colliding Confounding, mediation and colliding What types of shared covariates does the sibling comparison design control for? Arvid Sjölander and Johan Zetterqvist Causal effects and confounding A common aim of

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

PROD. TYPE: COM. Simple improved condence intervals for comparing matched proportions. Alan Agresti ; and Yongyi Min UNCORRECTED PROOF

PROD. TYPE: COM. Simple improved condence intervals for comparing matched proportions. Alan Agresti ; and Yongyi Min UNCORRECTED PROOF pp: --2 (col.fig.: Nil) STATISTICS IN MEDICINE Statist. Med. 2004; 2:000 000 (DOI: 0.002/sim.8) PROD. TYPE: COM ED: Chandra PAGN: Vidya -- SCAN: Nil Simple improved condence intervals for comparing matched

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

On the Breslow estimator

On the Breslow estimator Lifetime Data Anal (27) 13:471 48 DOI 1.17/s1985-7-948-y On the Breslow estimator D. Y. Lin Received: 5 April 27 / Accepted: 16 July 27 / Published online: 2 September 27 Springer Science+Business Media,

More information

A Poisson Process Approach for Recurrent Event Data with Environmental Covariates NRCSE. T e c h n i c a l R e p o r t S e r i e s. NRCSE-TRS No.

A Poisson Process Approach for Recurrent Event Data with Environmental Covariates NRCSE. T e c h n i c a l R e p o r t S e r i e s. NRCSE-TRS No. A Poisson Process Approach for Recurrent Event Data with Environmental Covariates Anup Dewanji Suresh H. Moolgavkar NRCSE T e c h n i c a l R e p o r t S e r i e s NRCSE-TRS No. 028 July 28, 1999 A POISSON

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable

More information

Spatio-Temporal Threshold Models for Relating UV Exposures and Skin Cancer in the Central United States

Spatio-Temporal Threshold Models for Relating UV Exposures and Skin Cancer in the Central United States Spatio-Temporal Threshold Models for Relating UV Exposures and Skin Cancer in the Central United States Laura A. Hatfield and Bradley P. Carlin Division of Biostatistics School of Public Health University

More information

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE ORIGINAL ARTICLE Marginal Structural Models as a Tool for Standardization Tosiya Sato and Yutaka Matsuyama Abstract: In this article, we show the general relation between standardization methods and marginal

More information

Bias in the Case--Crossover Design: Implications for Studies of Air Pollution NRCSE. T e c h n i c a l R e p o r t S e r i e s. NRCSE-TRS No.

Bias in the Case--Crossover Design: Implications for Studies of Air Pollution NRCSE. T e c h n i c a l R e p o r t S e r i e s. NRCSE-TRS No. Bias in the Case--Crossover Design: Implications for Studies of Air Pollution Thomas Lumley Drew Levy NRCSE T e c h n i c a l R e p o r t S e r i e s NRCSE-TRS No. 031 Bias in the case{crossover design:

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract

More information

Stratified Randomized Experiments

Stratified Randomized Experiments Stratified Randomized Experiments Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Stratified Randomized Experiments Stat186/Gov2002 Fall 2018 1 / 13 Blocking

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology

More information

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, ) Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, 302-308) Consider data in which multiple outcomes are collected for

More information

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Residuals for the

More information

Approaches to parameter and variance estimation in generalized linear models

Approaches to parameter and variance estimation in generalized linear models Approaches to parameter and variance estimation in generalized linear models Eugenio Andraca Carrera A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

A Reliable Constrained Method for Identity Link Poisson Regression

A Reliable Constrained Method for Identity Link Poisson Regression A Reliable Constrained Method for Identity Link Poisson Regression Ian Marschner Macquarie University, Sydney Australasian Region of the International Biometrics Society, Taupo, NZ, Dec 2009. 1 / 16 Identity

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

Simulation-based robust IV inference for lifetime data

Simulation-based robust IV inference for lifetime data Simulation-based robust IV inference for lifetime data Anand Acharya 1 Lynda Khalaf 1 Marcel Voia 1 Myra Yazbeck 2 David Wensley 3 1 Department of Economics Carleton University 2 Department of Economics

More information

Logistic regression: Miscellaneous topics

Logistic regression: Miscellaneous topics Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Causal inference in epidemiological practice

Causal inference in epidemiological practice Causal inference in epidemiological practice Willem van der Wal Biostatistics, Julius Center UMC Utrecht June 5, 2 Overview Introduction to causal inference Marginal causal effects Estimating marginal

More information

Lehmann Family of ROC Curves

Lehmann Family of ROC Curves Memorial Sloan-Kettering Cancer Center From the SelectedWorks of Mithat Gönen May, 2007 Lehmann Family of ROC Curves Mithat Gonen, Memorial Sloan-Kettering Cancer Center Glenn Heller, Memorial Sloan-Kettering

More information

Pattern mixture models with incomplete informative cluster size: application to a repeated pregnancy study

Pattern mixture models with incomplete informative cluster size: application to a repeated pregnancy study Appl. Statist. (8) 67, Part, pp. 73 Pattern mixture models with incomplete informative cluster size: application to a repeated pregnancy study Ashok Chaurasia, University of Waterloo, Canada Danping Liu

More information

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen

More information

Chapter 4. Parametric Approach. 4.1 Introduction

Chapter 4. Parametric Approach. 4.1 Introduction Chapter 4 Parametric Approach 4.1 Introduction The missing data problem is already a classical problem that has not been yet solved satisfactorily. This problem includes those situations where the dependent

More information

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016 Statistics 255 - Survival Analysis Presented March 8, 2016 Dan Gillen Department of Statistics University of California, Irvine 12.1 Examples Clustered or correlated survival times Disease onset in family

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s

Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s Chapter 867 Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s Introduction Logistic regression expresses the relationship between a binary response variable

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Penalized likelihood logistic regression with rare events

Penalized likelihood logistic regression with rare events Penalized likelihood logistic regression with rare events Georg Heinze 1, Angelika Geroldinger 1, Rainer Puhr 2, Mariana Nold 3, Lara Lusa 4 1 Medical University of Vienna, CeMSIIS, Section for Clinical

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information