TIME-DEPENDENT COVARIATES IN THE COX PROPORTIONAL-HAZARDS REGRESSION MODEL

Size: px
Start display at page:

Download "TIME-DEPENDENT COVARIATES IN THE COX PROPORTIONAL-HAZARDS REGRESSION MODEL"

Transcription

1 Annu. Rev. Public Health : Copyright c 1999 by Annual Reviews. All rights reserved TIME-DEPENDENT COVARIATES IN THE COX PROPORTIONAL-HAZARDS REGRESSION MODEL Lloyd D. Fisher and D. Y. Lin Department of Biostatistics, University of Washington, Seattle, Washington ; lfisher@biostat.washington.edu; danyu@biostat.washington.edu KEY WORDS: survival analysis, longitudinal analysis, censored data, model checking ABSTRACT The Cox proportional-hazards regression model has achieved widespread use in the analysis of time-to-event data with censoring and covariates. The covariates may change their values over time. This article discusses the use of such timedependent covariates, which offer additional opportunities but must be used with caution. The interrelationships between the outcome and variable over time can lead to bias unless the relationships are well understood. The form of a timedependent covariate is much more complex than in Cox models with fixed (non time-dependent covariates. It involves constructing a function of time. Further, the model does not have some of the properties of the fixed-covariate model; it cannot usually be used to predict the survival (time-to-event curve over time. The estimated probability of an event over time is not related to the hazard function in the usual fashion. An appendix summarizes the mathematics of time-dependent covariates. INTRODUCTION One of the areas of great methodologic advance in biostatistics has been the ability to handle censored time-to-event data. Censored means that some units of observation are observed for variable lengths of time but do not experience the event (or endpoint under study. Such data were first studied and analyzed by actuaries. Kaplan & Meier presented the product limit or Kaplan-Meier curve to efficiently use all of the data to estimate the time-to-event curve ( /99/ $

2 146 FISHER & LIN Comparison of groups based on this nonparametric estimate is given by the logrank test. Sir David Cox considered the introduction of predictor/explanatory variables or covariates into such models. The hazard may be thought of as proportional to the instantaneous probability of an event at a particular time. Cox (2 proposed a model in which the effect of the covariates is to multiply the hazard function by a function of the explanatory covariates. This means that two units of observation have a ratio of their hazards that is constant and depends on their covariate values. This model is usually called either the Cox regression model or the proportional-hazards regression model. It is important that covariates in this model may also be used in models in which the underlying survival curve has a fully parametric form, such as the Weibull distribution. At this time the Cox model is probably the most widely used model and is discussed in this paper, but the same issues, problems, and opportunities hold for the other parametric time-to-event models. Because one of the most common uses of the model is for death as an endpoint and also given the historical development in actuarial science, the field is sometimes referred to as survival analysis. In an industrial setting the events are often failure of devices, machines, etc, and the field is also referred to as failure time analysis. The model is often used to examine the predictive value of, for example, survival, in terms of subject (often patients in some medical setting covariates such as treatment, age, gender, height, weight, relative weight, smoking status, ethnicity categories, diastolic and systolic blood pressures, education, and income, to predict survival. The exponential of the coefficients from the Cox model gives the instantaneous relative risk for an increase of one unit for the covariate in question. In many instances covariate data are collected longitudinally. For example, blood pressure, CD4 count, relative weight, disease history, and hospitalization data may be collected at selected periodic time points. As another example, treatment or other exposure may change over time. It seems natural and appropriate to use the covariate information that varies over time in an appropriate statistical model. One method of doing this is the time-dependent Cox or proportional-hazards model. This article discusses the use of such models. The article is written primarily for those who have a working familiarity with the usual fixed-covariate proportional-hazards model. The emphasis is on differences that arise when time-dependent covariates are used instead of fixed covariates. This review avoids mathematical details, which are relegated to an appendix for readers with more mathematical background. We see below that the use of time-dependent covariates offers exciting opportunities for exploring associations and potentially causal mechanisms. Unfortunately we also see that the use of time-dependent covariates is technically

3 TIME-DEPENDENT COVARIATES 147 difficult in the choice of covariate form, has great potential for bias, and does not lead to prediction for the individual survival experience as does the usual Cox model with fixed covariate values. Selecting the Form of a Time-Dependent Covariate When a predictor or independent covariate is allowed to vary over time, one needs to determine its form over time. This introduces a number of subtleties and difficulties. Although this is easy to state, the application is more difficult. Several illustrations introduce some of the issues involved. ILLUSTRATION 1: SMOKING AND SURVIVAL As an example of a time-dependent covariate, consider the effect of smoking on survival. Suppose that we have a cohort of individuals who are contacted at yearly intervals. As part of the interview process, data are collected on both current smoking status (defined as smoking any cigarettes during the prior month and the estimated total number of cigarettes smoked over the past year. The hypothesis to be investigated is that current cigarette smoking increases the risk of death. Perhaps the most immediate approach is to use a step-function that equals one if the individual was smoking at the last follow-up and zero if the person was not smoking at the last follow-up. For example an individual who was alive after 4 years of follow-up and who was smoking at baseline, at years 1 and 2, but not at years 3 and 4, would have a time-dependent covariate that equals 1 up to the beginning of year 3 and then drops down to zero. Note the step when the smoking status changed. A step function is a function that takes on constant values on intervals. Cavender et al (1 present an analysis by using time-dependent covariates. The data are from the Coronary Artery Surgery Study (CASS, in which individuals with mild angina (i.e. chest pain caused by coronary artery disease were randomized to early coronary artery bypass graft surgery or early medical therapy. Data on smoking status were collected every 6 months, and, for the first analysis, a step function of the type described above (but with 6-months intervals was used. Very surprisingly, the estimated effect of current cigarette smoking on survival was positive although not statistically significant. This led to an examination of the individual patient smoking histories. It turned out that most of those who had died were smokers, but many stopped smoking at the last follow-up before their death. In a number of instances this was apparently explainable by hospitalization for a myocardial infarction or congestive heart failure. One conjectures that other patients may have had prodromal symptoms prompting smoking cessation. The investigators addressed the problem in two ways. One approach was to use a time-lagged covariate. At the first and second 6-month intervals,

4 148 FISHER & LIN the baseline smoking status was used. For subsequent intervals, the next to last follow-up value was used rather than the last assessment. This resulted in a statistically significant increased risk from cigarette smoking. A second approach (and that presented in this paper used the percentage of the follow-up period that the subject smoked. This also resulted in a statistically significant detrimental association with smoking. The point here is that the choice of a time-dependent covariate involves the choice of a functional form for the time-dependence of the covariate. This choice is usually not self-evident but may be suggested by biological understanding or biological hypothesis. Other examples illustrate this point. ILLUSTRATION 2: CHOLESTEROL-LOWERING DRUGS Suppose that we have an observational database on many individuals with high lipid values who have been placed on cholesterol-lowering drugs. Suppose also that a number of risk factors are in the database and that we want to examine the effect of treatment with a cholesterol-lowering drug. First note that the usual fixed covariates can be used in a model along with time-dependent covariates. Thus baseline lipid values could be used (as well as a time-dependent cholesterol covariate to allow adjustment for this and other baseline patient characteristics. (Formally one can think of a fixed covariate as a time-dependent covariate that happens to have a constant value for all time points. Although this is mathematically correct, computer software will run much faster if the covariates that do not change over time are entered as fixed covariates. How might one model the introduction of a cholesterol-lowering drug at some time during follow-up? If the drug effect is only through the plasma lipids and if the lipid values are periodically collected during follow-up, then one might enter the lipid value from the last follow-up as the value at each time point. Suppose the primary manner in which the lipid values contribute to mortality is through the atherosclerotic process. Also assume that abnormally high levels lead to a lipid build-up in the vascular lesions and, by contrast, that very low levels lead to some leaching out of the lipids. What if we assume that both the build-up and leaching out take place over extended time periods? Then we might want to use some complex function over time to model the effect. For example risk might be modeled as a moving weighted average of the values over some long time interval. The weights might be highest for the current levels and then drop off as the values become more remote. If the lipid values are not measured routinely but we know when lipidlowering drug therapy is implemented and we assume that the effect will become more pronounced over time because of the mechanisms involved, then we might want a function that is zero until the therapy is introduced and then increases (e.g. linearly over a fixed time interval (a few years? to a value of one. Thus

5 TIME-DEPENDENT COVARIATES 149 rather than a step function that jumps up, the benefit is assumed to increase continually over a time interval before reaching the maximum effect. ILLUSTRATION 3: CIRCADIAN VARIABILITY IN PLASMA LEVELS OF A PROTECTIVE DRUG Consider an antiplatelet drug to reduce complications of vascular disease, e.g. vascular death, myocardial infarction, and ischemic stroke, as a combination endpoint. The endpoint is reached at the first occurrence of any of the three components. The endpoints are assumed to be caused by the thrombotic process with platelets involved in producing the thrombosis. The protection is assumed to be directly related to the plasma level of the drug or biologic material being studied. Let us also assume that prior work with the compound given twice daily at a fixed dose has been reasonably used to model the plasma levels over time as a function of the time of dosing and selected patient covariate values, including weight, height, age, gender, and ethnicity, as well as measures from kidney and liver function tests. For each individual in the current study we can estimate the 24-hour circadian pattern of the plasma concentration (assuming the drug is taken at the times recommended. The model would be greatly enhanced if, at baseline, selected plasma values were taken for each patient to help in this modeling effort. The time of most of the events (except for death during sleep is assumed to be recorded. How might we model the relationship with the drug (as mediated through the modeled plasma levels of the drug? In addition to fixed patient covariates that increase the risk of an event (e.g. prior myocardial infarction, ischemic stroke, transient ischemic attack or peripheral arterial disease, or age, a time-dependent covariate that is the estimated plasma level at each time point for the individual patient could be used. Note that the production of this time-dependent covariate is quite complex. In the model there is a fixed parameter to be estimated that is the multiplier of the time-dependent covariate. If this is statistically significant, one might argue that there is a drug effect. However, it is well known that there is a circadian pattern to the time of occurrence of myocardial infarctions. The very early morning is a period of increased risk. To adjust for this, one could argue for circadian terms (for example the first five terms of the Fourier series to adjust for the circadian pattern of events without appealing to a drug effect. In this case, for the drug to be effective it would need to add predictive power to the circadian terms standing alone. On further reflection, the situation is even more complex. Why should the effect on the relative risk of an event be directly related to the plasma level of a drug based on the form of the Cox proportional-hazards model? Perhaps there is almost no drug effect until the plasma level reaches some value, and then the effect increases rapidly to a plateau at some other level. It may be that there is an intermediary measurement (e.g. in this case the percent reduction in platelet

6 150 FISHER & LIN aggregation as a function of plasma level that would be more closely related and that could be modeled. Even when receptors are saturated, the drug effect is usually not perfect. Thus perhaps the model should begin at a plasma level of zero and increase to an asymptote at some finite multiple of the relative risk. All of these ideas can lead to different models. Further one could try a number of models; in this case the multiple-comparison issue could become severe (if there is no independent cohort to examine the performance of a final model. Although the chance to model is a strength of the time-dependent covariate approach, for exploratory data analyses it also is a weakness because trying too many models can lead to great overfitting of the data. ILLUSTRATION 4: BONE MARROW TRANSPLANT GRAFTING Selected leukemias and other blood diseases are treated with bone marrow transplantation. In a bone marrow transplant (BMT, there is a conditioning regime to destroy the cancerous cells; after this conditioning, a transplant of bone marrow from the patient (autologous bone marrow, another person who has the same human leukocyte antigen haplotypes, or a matched unrelated donor is infused to the patient. Suppose that our endpoint of interest is survival. There is a substantial early mortality from the conditioning regimen, because there is a trade-off, with a stronger conditioning regimen increasing the risk of early death but reducing the risk of a relapse of the cancer. During approximately the first year, there is an increased risk of death from relapse of the original cancer. Further, both acutely and chronically there is a risk of the transplanted bone marrow attacking the recipient as foreign matter (graft versus host disease and causing death. In the long term there are also the usual risks of death as well as an increased risk of other cancers from the treatment for the cancer at hand. How might we try to model this situation with time-dependent covariates? If there are only bone transplant patients (who by definition live until the transplant is attempted, then the study cannot evaluate the effect of the BMT. Thus suppose we have a cohort of individuals diagnosed with a particular cancer that may or may not be treated with BMT at some stage. There may well be timedependent covariates for other chemotherapeutic and radiologic treatments, as well as for the BMT. Here we discuss possible time-dependent covariates for the BMT. The early risk of a death from the conditioning regime and the stress of the bone marrow transplant might be modeled by a step function that is one during the period of hospitalization and zero at other times. The risk of relapse over the first year after transplant could be modeled by a step function that is one over the first year after transplant and zero at other times. The transplant-related risk after year one could be modeled by a step function that equaled one from the end of the first year onward.

7 TIME-DEPENDENT COVARIATES 151 Prediction and Time-Dependent Covariates One of the benefits of the proportional-hazards model with all fixed covariates is the ability to give individualized predictions of the estimated time to the event of interest. Estimated curves under different treatment modalities may be used for counseling patients. Estimated survival curves under smoking and nonsmoking scenarios can illustrate the added risk caused by smoking. Curves with different covariate values can illustrate the interaction between multiple risk factors. With time-dependent covariates the ability to predict is usually lost. One reason is that, because the model depends on the value of a changing quantity (the time-dependent covariate, at a future time the future values are usually unknown until they are actually observed. Second, if we know the future value of some covariates (e.g. blood pressure, the existence of the values implies that the subject has not reached a death endpoint. This fact implies that one cannot estimate the survival curve from the observed and future values of a quantity such as blood pressure. The existence of a positive value would imply that the subject was still alive. Knowing the covariate would imply knowledge at each stage of the vital status. (The appendix discusses these issues more, while discussing internal and external covariates. Interpretation and Time-Dependent Covariates The time sequence of causal relationships is often difficult and tricky to untangle. For example, if a disease can cause wasting, then using weight as a time-dependent covariate to predict diagnosis may be somewhat of a selffulfilling prophecy and, rather than predicting the disease, a lower weight may be an early sign of as yet undiagnosed disease. Care must also be exercised when one assesses treatment efficacy with a regression model (1, which includes not only the treatment assignment but also a covariate (e.g. CD4 count and blood pressure measured subsequent to the treatment assignment. If the effect of treatment on survival is predominantly mediated through the covariate, such an analysis will show little or no treatment effect on survival. This, however, can give useful information about the mechanism by which the treatment operates. In fact, this type of analysis has recently been used to assess the role of biological markers (such as CD4 count and viral load as surrogate endpoints for clinical events (such as opportunistic infection and death in HIV/AIDS trials (8. Certain covariates can be useful in reducing the bias of involving time-dependent exposure, as is discussed below. In general if time-dependent covariates can change in relation to health or some other general concept related to the endpoint in the model, then interpretation is difficult and prone to be misleading. Great caution is advisable.

8 152 FISHER & LIN Discussion The Cox proportional-hazards regression model for time-to-event data may be used with covariates, independent variables, or predictor variables that vary over time. These are called time-dependent covariates. Their use is much more complicated in practice than the fixed (time-independent covariates. Further, the potential for erroneous inference and modeling is greatly increased. Still time-dependent covariates may be a powerful tool for exploring predictive relationships by using quantities that vary over time. The major points to remember include the following: (a the modeling of a time-dependent covariate involves the choice of a function over time; this functional form may be far from obvious and may require deep biologic insight; (b the choice of a complex functional form raises the possibility of too much modeling and great overfitting of a data set; (c many time-dependent covariates are closely associated with the unit under study, are usually generated by that unit (e.g. blood pressure, weight, HIV status, and remove the usual relationship between the hazard function and the survival function; (d time-dependent covariate models, except under certain circumstances, do not allow individual predictive time-to-event curves, which is different from the Cox model, with only fixed covariate values; (e extreme caution must be exercised when modeling time-dependent exposure or treatment, especially if the change in exposure or treatment is related to the subject s health status; ( f most comprehensive statistical-software packages now provide proportional-hazards regression modeling with both fixed and timedependent covariates allowed; (g computer software exists for the estimation of coefficients and for examining the goodness-of-fit of the model. The opportunities inherent in time-dependent modeling (including the explicit relationship of longitudinal values and the occurrence of an event must be understood in light of the potential biases, strong assumptions needed in terms of the lack of other possible explanations, and the need to choose more complex functional forms for the modeling effort. APPENDIX: MATHEMATICAL DISCUSSION Time-Dependent Covariates and Proportional Hazards Regression Let T be the failure time of interest, and let Z be a set of possibly timedependent covariates. We use Z(t to denote the value of Z at time t, and Z(t ={Z(s:0 s t}to denote the history of the covariates up to time t. It is convenient to formulate the effects of covariates on the failure time through the hazard function. The conditional-hazard function of T given Z is λ(t Z = Pr (T [t, t + dt T t, Z(t,

9 TIME-DEPENDENT COVARIATES 153 where (t, t + dt is a small interval from t to t + dt. The Cox (2 proportionalhazards model specifies that λ(t Z = λ 0 (te β Z(t, 1. where β is a set of unknown regression parameters and λ 0 (t is an unspecified baseline hazard function. Kalbfleisch & Prentice (5 distinguished between external and internal timedependent covariates. This classification is helpful in interpreting the regression models and results for time-dependent covariates. An external covariate is one that is not directly related to the failure mechanism. One example would be the age of an individual in a long-term follow-up study. Another example would be the level of air pollution as a risk factor for asthma attacks. A third example would be times, for example, time of day or day of the year. On the other hand, an internal covariate is a value over time generated by the individual under study. Examples would include the Karnofsky score, blood pressures, procedural history, and CD4 counts measured over the course of the study. A major difference between external and internal covariates lies in the relationship between the conditional hazard function λ(t Z and the conditional survival function. The conditional survival function for a given covariate history is defined in general by S(t Z = Pr (T > t Z(t. For external covariates, this is also given by ( t S(t Z = exp λ(s Z ds, which becomes ( S(t Z = exp 0 t 0 λ 0 (se β Z(s ds under Equation 1. By contrast, the conditional-hazard function bears no relationship to the conditional-survival function for internal covariates. In fact, the internal covariate requires the survival of the individual for its existence. For an internal covariate Z such as a Karnofsky score, S(t Z = 1 provided that Z(t does not indicate that the individual has died. For the internal covariate of blood pressure, a measurable value indicates that the individual is still alive. We now describe how to estimate the regression parameters of Equation 1. Suppose that we have n individuals in the study, such that the data consist of {X i,δ i, Z i (X i }i = 1,...,n, where X i is the observation time (i.e. the last contact date for the ith individual, and δ i indicates, by the values 1 versus 0,

10 154 FISHER & LIN whether the ith subject fails or is censored at X i, and Z i (X i is the covariate history of the ith individual up to the observation time X i. The estimation of β is based on the partial likelihood score function U(β = n j R δ i (Z i (X i i e β Z j (X i Z j (X i j R i e β Z j (X i, 2. i=1 where R i is the set of individuals who are at risk at X i, that is, whose observation times are X i. The maximum partial likelihood estimator ˆβ is the solution to U(β = 0. It is well-known that ˆβ is consistent and asymptotically normal with covariance matrix I 1 ( ˆβ, where ( n j R I (β = δ i e β Z j (X i Z j (X i Z j (X i i i=1 j R i e β Z j (X i { j R i e β Z j (X i Z j (X i }{ j R i e β Z j (X i Z j (X i } j R i e β Z j (X i. 3. It is apparent from Equations 2 and 3 that the statistical inference about β requires, at each uncensored X i, the values of the covariates for all the subjects who are at risk at X i.ifzvaries its values continuously over time and is measured only at certain time intervals, then Z j (X j may not be available. In such situations, some interpolation between repeated measurements is necessary; see Lin et al (9 for a description of various interpolation schemes. Epidemiologic cohort studies and disease prevention trials often involve follow-up on several thousand subjects for a number of years. The assembly of covariate histories, which requires biochemical analysis of blood samples or other specimens or the hand coding of individual diet records, can be prohibitively expensive if it is done on all cohort members. Under the case-cohort design (11, the covariate histories need only be assembled for all the cases (i.e. deaths plus a random subset of the entire cohort. The aforementioned partiallikelihood method can be modified to analyze data from case-cohort studies (11, 12. Lin & Ying (10 proposed a general method for handling incomplete measurements of time-dependent covariates, including the case-cohort design as a special case. Reducing Bias One needs to be extremely cautious when interpreting results involving timedependent exposure or treatment. A fundamental assumption in using models

11 TIME-DEPENDENT COVARIATES 155 like Equation 1 to formulate the effect of a time-dependent exposure or treatment on survival is that the change in exposure or treatment occurs in a random fashion. In many applications, however, individuals change exposure level or treatment for health-related reasons. Then the results based on a simple timedependent exposure (treatment indicator can be very misleading. Consider for example the simple scenario in which individuals will receive the treatment only when a certain intermediate event occurs. Suppose that the treatment is completely ineffective, but the occurrence of the intermediate event doubles the hazard of death. If the value of Z(t is 1 when the individual is on the treatment at time t and is 0 otherwise, then the hazard ratio parameter e β in Equation 1 is equal to 2. This is the effect of the intermediate event, not the real effect of the treatment. Needless to say, such an analysis would be completely misleading. It is possible to reduce the bias by properly adjusting for the factors that trigger the change in exposure level or treatment. Consider again the above scenario in which the treatment assignment is caused by the intermediate event, but now assume that not everyone who experiences the intermediate event will receive the treatment or that there are individuals who do not experience the intermediate event but will take the treatment. Then it is possible to fit models like λ(t Z = λ 0 (te β 1 Z 1 (t+β 2 Z 2 (t, where Z 1 (t indicates whether the individual is on the treatment at time t, and Z 2 (t indicates whether the individual has experienced the intermediate event by time t. In this model, β 1 indeed pertains to the actual effect of the treatment (assuming that no other similar biases are in the model. In practice, the reasons for changing exposure level or treatment may not be recorded or may be hard to quantify adequately. Then it is difficult to adjust for the factors that trigger the change through modeling. In such situations, modeling-time dependent exposure or treatment should proceed with extreme caution and awareness of the potential biases involved. Model Checking Equation 1 assumes that covariates have proportionate effects on the hazard function over time. For instance, if Z(t is a single time-dependent indicator covariate, then Equation 1 implies that the hazard function is λ 0 (t when Z(t = 0 and is λ 0 (te β when Z(t = 1. It is important to assess this proportionalhazards assumption. There are a number of methods available in the literature. Here we discuss two methods that are most useful for time-dependent covariates.

12 156 FISHER & LIN Both methods are related to the score function U(β given in Equation 2. Lin et al (9 proposed the following class of weighted score functions: U w (β = n j R W (X i δ i (Z i (X i i e β Z j (X i Z j (X i j R i e β Z j (X i, i=1 where W (t is a weight function dependent on t. Let ˆβ w be the solution to U w (β = 0. Suppose that W (t is a decreasing function of t and that the effects of Z are not proportional on the hazard function, but rather diminishing over time. In this case, ˆβ estimates an average of covariate effects over time, whereas ˆβ w estimates a weighted average of covariate effects over time with more weights placed on the earlier part of the survival distribution, where the covariate effects are stronger. Therefore, ˆβ w will tend to be larger than ˆβ. This fact motivates us to test the proportional hazards assumption by comparing ˆβ and ˆβ w. Lin (7 derived a formal goodness-of-fit test based on this comparison. The above test is sensitive to the choice of the weight function. More omnibus tests can be obtained by considering the following process: U( ˆβ; t = j R δ i (Z i (X i i e β Z j (X i Z j (X i. 4. i:x i t j R i e ˆβ Z j (X i Note that U( ˆβ; t is the partial likelihood score function based on the events that occur before time t. If the effects of covariates are indeed proportionate, then the solution to the equation U(β; t = 0 should be similar to ˆβ, regardless of the choice of t. In other words, U( ˆβ; t should be close to 0 for all t. On the other hand, if the effects of covariates are not proportionate, then U( ˆβ; t will tend to deviate from 0. Thus, it is reasonable to construct goodness-of-fit tests based on U( ˆβ; t. If Z is a single covariate, then the critical values for the supremum statistic Q = max i I 1/2 ( ˆβ U( ˆβ; X i are 1.628, 1.358, and for the significance levels of 0.01, 0.05, and 0.10, respectively (13. If Z consists of multiple covariates, then the proportional-hazards assumption for the jth component of Z can be tested by the supremum statistic Q j = max i {I 1 ( ˆβ jj } 1/2 U(ˆβ;X i, where U j is the jth component of U and I 1 ( ˆβ jj is the jth diagonal element of I 1 ( ˆβ. If the covariates are uncorrelated, then the critical values for Q j are the same as those of Q described above (4. If the covariates are correlated, then the simulation technique of Lin et al (9 can be used to calculate the p values. Visit the Annual Reviews home page at

13 TIME-DEPENDENT COVARIATES 157 Literature Cited 1. Cavender JB, Rogers WJ, Fisher LD, Gersh BJ, Coggin JC, Myers WO Effects of smoking on survival and morbidity in patients randomized to medical or surgical therapy in the Coronary Artery Surgery Study (CASS: 10-year follow-up. J. Am. Coll. Cardiol. 20: Cox DR Regression models and life tables. J. Royal Stat. Soc. Ser. B 34: Dickson ER, Grambsch PM, Fleming TR, Fisher LD, Langworthy A Prognosis in primary biliary cirrhosis: model for decision making. Hepatology 10: Fleming TR, Harrington DP Counting Processes and Survival Analysis. New York: Wiley. 448 pp. 5. Kalbfleisch JD, Prentice RL The Statistical Analysis of Failure Time Data. New York: Wiley. 321 pp. 6. Kaplan EL, Meier P Nonparametric estimation for incomplete observations. J. Am. Stat. Assoc. 53: Lin DY Goodness-of-fit analysis for the Cox regression model based on a class of parameter estimators. J. Am. Stat. Assoc. 86: Lin DY, Fischl MA, Schoenfeld DA Evaluating the role of CD4-lymphocyte counts as surrogate endpoints in human immunodeficiency virus clinical trials. Stat. Med. 12: Lin DY, Wei LJ, Ying Z Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 80: Lin DY, Ying Z Cox regression with incomplete covariate measurements. J. Am. Stat. Assoc. 86: Prentice RL A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73: Self SG, Prentice RL Asymptotic distribution theory and efficiency results for case-cohort studies. Ann. Stat. 16: Wei LJ Testing goodness of fit for proportional hazards model with censored observations. J. Am. Stat. Assoc. 79:649 52

14 Annual Review of Public Health Volume 20, 1999 CONTENTS PUBLIC HEALTH IN THE TWENTIETH CENTURY: Advances and Challenges, Jonathan E. Fielding xiii Personal Reflections on Occupational Health in the Twentieth Century: Spiraling to the Future, Mark R. Cullen 1 A Future for Epidemiology?, S. Schwartz, E. Susser, M. Susser 15 Lessons in Environmental Health in the Twentieth Century, Michael Gochfeld, Bernard D. Goldstein 35 US Health Care: a Look Ahead to 2025, Eli Ginzberg 55 What Have We Learned About Public Health in the Twentieth Century: A Glimpse Through Health Promotion's Rearview Mirror, L. W. Green 67 Understanding Changing Risk Factor Associations with Increasing Age in Adults, G. A. Kaplan, M. N. Haan, R. B. Wallace 89 Advances in Clinical Trials in the Twentieth Century, Lloyd D. Fisher 109 Methods for Analyzing Health Care Utilization and Costs, P. Diehr, D. 125 Yanez, A. Ash, M. Hornbrook, D. Y. Lin Time-Dependent Covariates in the Cox Proportional Hazards Regression 145 Model, Lloyd D. Fisher, D. Y. Lin Lessons from 12 Years of Comparative Risk Projects, Ken Jones, Heidi 159 Klein Unexplained Increases in Cancer Incidence in the United States from 1975 to 1994: Possible Sentinel Health Indicators?, Gregg E. Dinse, 173 David M. Umbach, Annie J. Sasco, David G. Hoel, Devra L. Davis Eradication of Vaccine-Preventable Diseases, A. Hinman 211 Immunization Registries in the United States: Implications for the Practice of Public Health in a Changing Health Care System, David Wood, Kristin 231 N. Saarlas, Moira Inkelas, Bela T. Matyas Teen Pregnancy Prevention: Do Any Programs Work?, Josefina J. Card 257 The Social Environment and Health: A Discussion of the Epidemiologic Literature, I. H. Yen, S. L. Syme Health Status Assessment Methods for Adults: Past Accomplishments and Future Challenges, Colleen A. McHorney Patient Outcomes Research Teams: Contribution to Outcomes and Effectiveness Research, Deborah Freund, Judith Lave, Carolyn Clancy, Gillian Hawker, Victor Hasselblad, Robert Keller, Ellen Schneiter, James Wright Pharmacy Benefit Management Companies: Dimensions of Performance, Helene L. Lipton, David H. Kreling, Ted Collins, Karen C. Hertz The Key to the Door: Medicaid's Role in Improving Health Care for Women and Children, Diane Rowland, Alina Salganicoff, Patricia Seliger Keenan

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL

A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL Christopher H. Morrell, Loyola College in Maryland, and Larry J. Brant, NIA Christopher H. Morrell,

More information

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU

More information

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington Analysis of Longitudinal Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Auckland 8 Session One Outline Examples of longitudinal data Scientific motivation Opportunities

More information

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What? You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Multi-state models: prediction

Multi-state models: prediction Department of Medical Statistics and Bioinformatics Leiden University Medical Center Course on advanced survival analysis, Copenhagen Outline Prediction Theory Aalen-Johansen Computational aspects Applications

More information

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 9.1 Survival analysis involves subjects moving through time Hazard may

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen Recap of Part 1 Per Kragh Andersen Section of Biostatistics, University of Copenhagen DSBS Course Survival Analysis in Clinical Trials January 2018 1 / 65 Overview Definitions and examples Simple estimation

More information

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Introduction This course is an applied course,

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Ph.D. course: Regression models. Introduction. 19 April 2012

Ph.D. course: Regression models. Introduction. 19 April 2012 Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 19 April 2012 www.biostat.ku.dk/~pka/regrmodels12 Per Kragh Andersen 1 Regression models The distribution of one outcome variable

More information

On the Breslow estimator

On the Breslow estimator Lifetime Data Anal (27) 13:471 48 DOI 1.17/s1985-7-948-y On the Breslow estimator D. Y. Lin Received: 5 April 27 / Accepted: 16 July 27 / Published online: 2 September 27 Springer Science+Business Media,

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 25 April 2013 www.biostat.ku.dk/~pka/regrmodels13 Per Kragh Andersen Regression models The distribution of one outcome variable

More information

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Patrick J. Heagerty PhD Department of Biostatistics University of Washington 166 ISCB 2010 Session Four Outline Examples

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

A multi-state model for the prognosis of non-mild acute pancreatitis

A multi-state model for the prognosis of non-mild acute pancreatitis A multi-state model for the prognosis of non-mild acute pancreatitis Lore Zumeta Olaskoaga 1, Felix Zubia Olaskoaga 2, Guadalupe Gómez Melis 1 1 Universitat Politècnica de Catalunya 2 Intensive Care Unit,

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

Lecture 8 Stat D. Gillen

Lecture 8 Stat D. Gillen Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels

More information

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Distributions, Hazard Functions, Cumulative Hazards BIO 244: Unit 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution

More information

Comparative effectiveness of dynamic treatment regimes

Comparative effectiveness of dynamic treatment regimes Comparative effectiveness of dynamic treatment regimes An application of the parametric g- formula Miguel Hernán Departments of Epidemiology and Biostatistics Harvard School of Public Health www.hsph.harvard.edu/causal

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

Time-dependent covariates

Time-dependent covariates Time-dependent covariates Rasmus Waagepetersen November 5, 2018 1 / 10 Time-dependent covariates Our excursion into the realm of counting process and martingales showed that it poses no problems to introduce

More information

Meei Pyng Ng 1 and Ray Watson 1

Meei Pyng Ng 1 and Ray Watson 1 Aust N Z J Stat 444), 2002, 467 478 DEALING WITH TIES IN FAILURE TIME DATA Meei Pyng Ng 1 and Ray Watson 1 University of Melbourne Summary In dealing with ties in failure time data the mechanism by which

More information

Instrumental variables estimation in the Cox Proportional Hazard regression model

Instrumental variables estimation in the Cox Proportional Hazard regression model Instrumental variables estimation in the Cox Proportional Hazard regression model James O Malley, Ph.D. Department of Biomedical Data Science The Dartmouth Institute for Health Policy and Clinical Practice

More information

1 Introduction. 2 Residuals in PH model

1 Introduction. 2 Residuals in PH model Supplementary Material for Diagnostic Plotting Methods for Proportional Hazards Models With Time-dependent Covariates or Time-varying Regression Coefficients BY QIQING YU, JUNYI DONG Department of Mathematical

More information

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis. Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Multistate models and recurrent event models

Multistate models and recurrent event models and recurrent event models Patrick Breheny December 6 Patrick Breheny University of Iowa Survival Data Analysis (BIOS:7210) 1 / 22 Introduction In this final lecture, we will briefly look at two other

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

A SMOOTHED VERSION OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

A SMOOTHED VERSION OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa A SMOOTHED VERSION OF THE KAPLAN-MEIER ESTIMATOR Agnieszka Rossa Dept. of Stat. Methods, University of Lódź, Poland Rewolucji 1905, 41, Lódź e-mail: agrossa@krysia.uni.lodz.pl and Ryszard Zieliński Inst.

More information

DISCRETE PROBABILITY DISTRIBUTIONS

DISCRETE PROBABILITY DISTRIBUTIONS DISCRETE PROBABILITY DISTRIBUTIONS REVIEW OF KEY CONCEPTS SECTION 41 Random Variable A random variable X is a numerically valued quantity that takes on specific values with different probabilities The

More information

Residuals and model diagnostics

Residuals and model diagnostics Residuals and model diagnostics Patrick Breheny November 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/42 Introduction Residuals Many assumptions go into regression models, and the Cox proportional

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

A note on R 2 measures for Poisson and logistic regression models when both models are applicable

A note on R 2 measures for Poisson and logistic regression models when both models are applicable Journal of Clinical Epidemiology 54 (001) 99 103 A note on R measures for oisson and logistic regression models when both models are applicable Martina Mittlböck, Harald Heinzl* Department of Medical Computer

More information

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

More information

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Right censored

More information

Multistate models and recurrent event models

Multistate models and recurrent event models Multistate models Multistate models and recurrent event models Patrick Breheny December 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Multistate models In this final lecture,

More information

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek Survival Analysis 732G34 Statistisk analys av komplexa data Krzysztof Bartoszek (krzysztof.bartoszek@liu.se) 10, 11 I 2018 Department of Computer and Information Science Linköping University Survival analysis

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Multistate models in survival and event history analysis

Multistate models in survival and event history analysis Multistate models in survival and event history analysis Dorota M. Dabrowska UCLA November 8, 2011 Research supported by the grant R01 AI067943 from NIAID. The content is solely the responsibility of the

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

Robustifying Trial-Derived Treatment Rules to a Target Population

Robustifying Trial-Derived Treatment Rules to a Target Population 1/ 39 Robustifying Trial-Derived Treatment Rules to a Target Population Yingqi Zhao Public Health Sciences Division Fred Hutchinson Cancer Research Center Workshop on Perspectives and Analysis for Personalized

More information

Chapter 11. Correlation and Regression

Chapter 11. Correlation and Regression Chapter 11. Correlation and Regression The word correlation is used in everyday life to denote some form of association. We might say that we have noticed a correlation between foggy days and attacks of

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

1 The problem of survival analysis

1 The problem of survival analysis 1 The problem of survival analysis Survival analysis concerns analyzing the time to the occurrence of an event. For instance, we have a dataset in which the times are 1, 5, 9, 20, and 22. Perhaps those

More information

Textbook: Survivial Analysis Techniques for Censored and Truncated Data 2nd edition, by Klein and Moeschberger

Textbook: Survivial Analysis Techniques for Censored and Truncated Data 2nd edition, by Klein and Moeschberger Lecturer: James Degnan Office: SMLC 342 Office hours: MW 12:00 1:00 or by appointment E-mail: jamdeg@unm.edu Please include STAT474 or STAT574 in the subject line of the email to make sure I don t overlook

More information

4. Comparison of Two (K) Samples

4. Comparison of Two (K) Samples 4. Comparison of Two (K) Samples K=2 Problem: compare the survival distributions between two groups. E: comparing treatments on patients with a particular disease. Z: Treatment indicator, i.e. Z = 1 for

More information

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample Bios 323: Applied Survival Analysis Qingxia (Cindy) Chen Chapter 7 Fall 2012 Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample H 0 : S(t) = S 0 (t), where S 0 ( ) is known survival function,

More information

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times Patrick J. Heagerty PhD Department of Biostatistics University of Washington 1 Biomarkers Review: Cox Regression Model

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science Log-linearity for Cox s regression model Thesis for the Degree Master of Science Zaki Amini Master s Thesis, Spring 2015 i Abstract Cox s regression model is one of the most applied methods in medical

More information

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models 26 March 2014 Overview Continuously observed data Three-state illness-death General robust estimator Interval

More information

Philosophy and Features of the mstate package

Philosophy and Features of the mstate package Introduction Mathematical theory Practice Discussion Philosophy and Features of the mstate package Liesbeth de Wreede, Hein Putter Department of Medical Statistics and Bioinformatics Leiden University

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit 2011/04 LEUKAEMIA IN WALES 1994-2008 Welsh Cancer Intelligence and Surveillance Unit Table of Contents 1 Definitions and Statistical Methods... 2 2 Results 7 2.1 Leukaemia....... 7 2.2 Acute Lymphoblastic

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Core Courses for Students Who Enrolled Prior to Fall 2018

Core Courses for Students Who Enrolled Prior to Fall 2018 Biostatistics and Applied Data Analysis Students must take one of the following two sequences: Sequence 1 Biostatistics and Data Analysis I (PHP 2507) This course, the first in a year long, two-course

More information

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Missing covariate data in matched case-control studies: Do the usual paradigms apply? Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016 Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of

More information

7.1 The Hazard and Survival Functions

7.1 The Hazard and Survival Functions Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

More information

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Survival Analysis. Lu Tian and Richard Olshen Stanford University 1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

TECHNICAL APPENDIX WITH ADDITIONAL INFORMATION ON METHODS AND APPENDIX EXHIBITS. Ten health risks in this and the previous study were

TECHNICAL APPENDIX WITH ADDITIONAL INFORMATION ON METHODS AND APPENDIX EXHIBITS. Ten health risks in this and the previous study were Goetzel RZ, Pei X, Tabrizi MJ, Henke RM, Kowlessar N, Nelson CF, Metz RD. Ten modifiable health risk factors are linked to more than one-fifth of employer-employee health care spending. Health Aff (Millwood).

More information

Bias in Markov Models of Disease

Bias in Markov Models of Disease Bias in Markov Models of Disease Daniel M. Faissol, Paul M. Griffin, and Julie L. Swann Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205 Phone:

More information

Varieties of Count Data

Varieties of Count Data CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function

More information

Distributed analysis in multi-center studies

Distributed analysis in multi-center studies Distributed analysis in multi-center studies Sharing of individual-level data across health plans or healthcare delivery systems continues to be challenging due to concerns about loss of patient privacy,

More information

2 Chapter 2: Conditional Probability

2 Chapter 2: Conditional Probability STAT 421 Lecture Notes 18 2 Chapter 2: Conditional Probability Consider a sample space S and two events A and B. For example, suppose that the equally likely sample space is S = {0, 1, 2,..., 99} and A

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

3003 Cure. F. P. Treasure

3003 Cure. F. P. Treasure 3003 Cure F. P. reasure November 8, 2000 Peter reasure / November 8, 2000/ Cure / 3003 1 Cure A Simple Cure Model he Concept of Cure A cure model is a survival model where a fraction of the population

More information

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA Ørnulf Borgan Bryan Langholz Abstract Standard use of Cox s regression model and other relative risk regression models for

More information

A Poisson Process Approach for Recurrent Event Data with Environmental Covariates NRCSE. T e c h n i c a l R e p o r t S e r i e s. NRCSE-TRS No.

A Poisson Process Approach for Recurrent Event Data with Environmental Covariates NRCSE. T e c h n i c a l R e p o r t S e r i e s. NRCSE-TRS No. A Poisson Process Approach for Recurrent Event Data with Environmental Covariates Anup Dewanji Suresh H. Moolgavkar NRCSE T e c h n i c a l R e p o r t S e r i e s NRCSE-TRS No. 028 July 28, 1999 A POISSON

More information

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018

More information

Survival Analysis. Stat 526. April 13, 2018

Survival Analysis. Stat 526. April 13, 2018 Survival Analysis Stat 526 April 13, 2018 1 Functions of Survival Time Let T be the survival time for a subject Then P [T < 0] = 0 and T is a continuous random variable The Survival function is defined

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Effects of multiple interventions

Effects of multiple interventions Chapter 28 Effects of multiple interventions James Robins, Miguel Hernan and Uwe Siebert 1. Introduction The purpose of this chapter is (i) to describe some currently available analytical methods for using

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Package threg. August 10, 2015

Package threg. August 10, 2015 Package threg August 10, 2015 Title Threshold Regression Version 1.0.3 Date 2015-08-10 Author Tao Xiao Maintainer Tao Xiao Depends R (>= 2.10), survival, Formula Fit a threshold regression

More information

Cluster Analysis using SaTScan

Cluster Analysis using SaTScan Cluster Analysis using SaTScan Summary 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic Methods Probability

More information

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University Causal Modeling in Environmental Epidemiology Joel Schwartz Harvard University When I was Young What do I mean by Causal Modeling? What would have happened if the population had been exposed to a instead

More information

TGDR: An Introduction

TGDR: An Introduction TGDR: An Introduction Julian Wolfson Student Seminar March 28, 2007 1 Variable Selection 2 Penalization, Solution Paths and TGDR 3 Applying TGDR 4 Extensions 5 Final Thoughts Some motivating examples We

More information

STATISTICS Relationships between variables: Correlation

STATISTICS Relationships between variables: Correlation STATISTICS 16 Relationships between variables: Correlation The gentleman pictured above is Sir Francis Galton. Galton invented the statistical concept of correlation and the use of the regression line.

More information

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016 Mediation analyses Advanced Psychometrics Methods in Cognitive Aging Research Workshop June 6, 2016 1 / 40 1 2 3 4 5 2 / 40 Goals for today Motivate mediation analysis Survey rapidly developing field in

More information

LSS: An S-Plus/R program for the accelerated failure time model to right censored data based on least-squares principle

LSS: An S-Plus/R program for the accelerated failure time model to right censored data based on least-squares principle computer methods and programs in biomedicine 86 (2007) 45 50 journal homepage: www.intl.elsevierhealth.com/journals/cmpb LSS: An S-Plus/R program for the accelerated failure time model to right censored

More information