Statistical models for estimating the effects of intermediate variables in the presence of time-dependent confounders

Size: px
Start display at page:

Download "Statistical models for estimating the effects of intermediate variables in the presence of time-dependent confounders"

Transcription

1 Statistical models for estimating the effects of intermediate variables in the presence of time-dependent confounders Dissertation zur Erlangung des Dotorgrades der Faultät für Mathemati und Physi der Albert-Ludwigs-Universität Freiburg im Breisgau vorgelegt von Christine Gall geboren in Erlangen Dezember 2011

2 Dean: Prof. Dr. Kay Königsmann 1. Referent: Prof. Dr. Martin Schumacher Institut für Medizinische Biometrie und Medizinische Informati Albert-Ludwigs-Universität Freiburg im Breisgau Stefan-Meier-Str Freiburg 2. Referent: Prof. Dr. Odd Aalen, Oslo Datum der Promotion: 06. Februar 2012

3 Summary Estimating the effect of a time-varying exposure when time-dependent confounders are involved is not feasible with standard statistical models. Models for causal inference cope with time-dependent confounders, but are still controversially discussed with respect to their unverifiable assumptions and the interpretation of their effect measures. In this thesis, two such models proposed by Robins are addressed which are defined within the counterfactual framewor. This framewor facilitates the definition of the treatment effect, but requires untestable identifying assumptions. These assumptions only implicitely pose restrictions on the observable data which means that the confounding structure between observables cannot be illustrated straightforwardly. Insight into the properties of these two counterfactual models is given by bringing them together with common approaches defined within the observable framewor. For illustration and to explore the applicability of the models, two data examples are considered. The Structural Nested Failure Time Model (SNFTM) applies for survival settings. Its modelling assumptions with respect to the observable data structure are shown by proposing a multistate model which is conform with the SNFTM and where the causal SNFTM parameter directly enters. Multistate models do not use counterfactual or latent variables, but directly model the observable variables such that they arise successively as in a prospective trial. This model is also used as simulation model for data-generation to compare the behaviour of a typically used Cox model and the SNFTM. Marginal Structural Models are flexible with respect to the type of outcome. To give access to this approach to people unfamiliar with the counterfactual framewor, we show that it can be seen as an extension of a common approach developed for the handling of missing outcomes which is based on related unverifiable assumptions. Reducing the complexity, we regard the different components step by step before finally incorporating the structural model. i

4

5 Contents Summary i 1. Introduction Structure of the thesis Data examples Nosocomial infections on intensive care units Preoperative breast cancer therapy Time-dependent confounding Causal inference Causal conclusions Causal approaches Counterfactual models Theoretical bacground Theory concerning counterfactual framewor Counterfactual framewor Structural Nested Failure Time Model Marginal Structural Models Contrasting SNFTM and MSM Theory concerning observable framewor Multistate models Inverse Probability of Censoring Weighting SNFTM: Effect of nosocomial infection on length of hospital stay Definition and estimation of the extra stay iii

6 Contents 4.2. Application: SIR3 study Information on the ICU data Estimation of the SNFTM parameter Estimation of the extra stay Artificial ventilation as time-dependent confounder Multistate model conform with assumptions of SNFTM Idea to model the action of a time-dependent confounder Definition of the multistate model Effect of INF on AV and of AV on INF and discharge Discharge hazard for infected, unventilated patients Discharge hazard for infected, ventilated patients Increased ris of AV due to infection: c INFAV > Simulation study Data generation Parameter values Illustration of characteristics of infected population Estimation of effects by Cox model Estimation of effects by SNFTM Results Comparison to simulation proposed by Robins Extension to model current AV status The case without time-dependent confounding The case with time-dependent confounding Assessing the assumption of no unmeasured confounders Marginal Structural Models as extension of missing data approach Time-dependent confounding within one treatment arm First step: outcome after application of a fixed number of cycles Second step: dose effect by comparing two groups Third step: dose effect by comparing all groups using a Marginal Structural Model Contrasting IPCW with the MSM approach Therapy effect: comparison of both treatment arms iv

7 Contents 6.7. Application: Geparduo study Information on the breast cancer data Estimation of the weights Dose effect Therapy effect Discussion and outloo 67 A. Appendix: Abbreviations and notation 71 A.1. Abbreviations A.2. Notation B. Appendix: Multistate model meets assumption of no unmeasured confounders 75 B.1. Start in state 4 reached from 2 compared to start in state B.1.1. Start in state 4 reached from B.1.2. Start in state B.2. Start in state 3 compared to start in state B.2.1. Start in state B.2.2. Start in state C. Appendix: Parts of this thesis published previously or submitted for peer review 85 v

8

9 1. Introduction This thesis addresses the impact of a time-varying exposure on an outcome of interest. If time-dependent covariates or even time-dependent confounders are involved, it is difficult to specify and estimate the effect of the time-varying exposure. Whereas including timedependent confounders in standard statistical models is not feasible, statistical models for causal inference cope with them. But they come along with untestable assumptions and their effect measures are not straightforwardly interpretable within the observable data structure. The focus lies on two structural models proposed by Robins, the Structural Nested Failure Time Model (SNFTM) and the Marginal Structural Model (MSM). They are defined within the counterfactual framewor, where, according to any possible exposure regime, a potential outcome is defined. The outcome which belongs to the actually observed exposure regime is set equal to the observed outcome. As the data does not comprise any direct information on the other potential outcomes, problems of causal inference can be viewed as a problem of missing data with respect to the counterfactual outcomes associated with exposure regimes other than the one actually observed [1]. The counterfactual framewor facilitates the definition of the effect as it does not involve any probability models for the occurrence of exposure [2]. However, it is controversially discussed because it explicitely maes assumptions whose validity is untestable. For some statisticians this is unacceptable, as e.g. for Dawid [3]. Others, lie Greenland and Morgenstern [4], appreciate that this maes aware of the limitations of empirical research on causal effects and offers the opportunity to modify experimental design or evaluation techniques towards plausible assumptions. The counterfactual framewor provides a formalization of the assumptions used for causal inference. However, they only determine the confounding structure within the counterfactual model which is not explicitely transferable to the confounding structure within the embedded observable setting. 1

10 1. Introduction The aim of this thesis is to explore the characteristics and the applicability of these two counterfactual models and to wor out new aspects which give insight into the model properties and their relevance within the observable data structure. For illustration, two data examples are considered which are outlined below. The SNFTM applies for time-to-event outcomes. To clarify the confounding structure assumed by the SNFTM with respect to relations between the observable data, we synthesize the SNFTM with multistate modelling. Multistate models apply to event history data. Changes over time are regarded as the occurrence of events which are described by transitions between different states. The states are defined according to the possible types of events that occur. They are defined within the observable framewor. The observable data structure is directly modeled and all variables arise successively as in a prospective trial. As they tae into account the chronological order, causal interpretations might be facilitated in the sense that the cause will always precede the effect regarding the interaction between life history events [5]. We propose a simple joint model defined by a multistate model where the causal SNFTM parameter enters directly. The difficulty is that the SNFTM parameter refers to the total causal effect whereas the multistate model is characterised by direct effects. We solve this problem by defining a delayed impact of changes in covariates which are already affected by the exposure. This illustrates that effect parameters should be compared with caution. The multistate model is also used as simulation model for data-generation to illustrate data characteristics and explore the behaviour of a typically used Cox model in comparison to the SNFTM. MSMs belong to another class of causal structural models. They are flexible with respect to the type of outcome. According to our data example, we regard a time-constant outcome measured at the end of study. Their parameters are estimated by Inverse Probability of Treatment Weighting. We present MSMs as an extension of the common approach of Inverse Probability of Censoring Weighting (IPCW) developed for the handling of missing outcomes. It is based on related unverifiable assumptions about the reasons for missingness, but, however, their necessity is rather accepted by the statistical community. By contrasting both approaches, we give insight into the structural model approach. 2

11 1.1. Structure of the thesis 1.1. Structure of the thesis In chapter 2, we shortly address the aim of causal approaches and the idea of the counterfactual framewor. Chapter 3 outlines the theoretical bacground used within this thesis. Its first part refers to the counterfactual framewor with focus on the SNFTM and MSMs. The SNFTM is reviewed in detail providing all instructions for the analysis of the ICU example (chapter 4). The MSM is introduced more shortly with focus on the underlying ideas, as a detailed illustration is given in chapter 6 when contrasting the MSM to IPCW. For both structural models, the description of the different steps is similarly structured in order to prepare their comparison in section The second part of chapter 3 corresponds to the observable framewor considering the non-causal approaches used to illustrate aspects of the structural models. These are multistate models and IPCW. Chapters 4 and 5 address the SNFTM. The ICU data example is analysed by a SNFTM in chapter 4 where we additionally address a quantity whose definition is facilitated by the counterfactual framewor and whose estimation is done by plugging in the SNFTM parameter. In chapter 5, we propose a simple joint model defined by a multistate model which is conform with the SNFTM. Its main purpose is not to facilitate data generation using it as simulation model, but to illustrate the data structure required by the SNFTM only with respect to observable variables. In chapter 6, we illustrate MSMs as an extension of a missing data approach with exemplification by data on breast cancer. Thereby, we split up the deterrend complexity of the MSM by first doing without the structural model and accessing the counterfactual framewor as well as the estimating procedure by relating it to the more familiar procedure of IPCW Data examples We regard two data examples which both comprise a time-dependent confounder. One focusses on a survival endpoint, the other on a binary outcome manifest at the end of study. To already eep the settings in mind and get an impression of the action of a time-dependent confounder, we shortly introduce them without quantifying any data characteristics. The first example is used to illustrate the SNFTM which is investigated 3

12 1. Introduction in chapters 4 and 5. The second example is considered in chapter 6 which addresses the MSM Nosocomial infections on intensive care units The SIR3 (Spread of nosocomial infections and resistant pathogens) study is a cohort study to prospectively collect data to examine the effect of hospital-acquired, i.e. nosocomial, infections. All patients admitted to certain intensive care units (ICUs) who stayed on the unit for more than 48 hours are followed during their ICU stay. The documented data include baseline parameters as well as time-dependent variables which are recorded on a daily basis. The latter include clinical parameters and the use of medical devices, e.g. artificial ventilation. Furthermore, there is information on the onset times of nosocomial infections. We focus on pneumonia which is one of the most frequent and severe nosocomial infections. We are interested in the impact of the occurrence of a nosocomial infection on the length of ICU stay which is given by death and discharge time, respectively. Here, the use of artificial ventilation is a candidate for being a time-dependent confounder. Figure 1.1 shows typical courses of the ICU stay. A detailed description of Figure 1.1.: Typical courses of ICU stay. End of ICU stay: discharge alive, death, censored. Time from nosocomial infection. Use of artificial ventilation. the data can be found in Grundmann et al. [6] and Beyersmann et al. [7]. 4

13 1.3. Time-dependent confounding Preoperative breast cancer therapy The Geparduo study [8] is a randomised controlled clinical trial in breast cancer to compare two preoperatively applied chemotherapy schemes with respect to a pathologic result on remission. The chemotherapies are given in repeated cycles where medical values measured before each cycle may lead to stop chemotherapy prior to the last planned cycle. Regardless of the number of given chemotherapy cycles, the study patients are operated and parts of the breast are resected. Therefore, response assessment was possible for every patient. The main interest of the Geparduo study lies in the outcome after all planned cycles. However, we use the data separately per treatment arm and analyse the outcome after a certain number of cycles. This is possible, as due to the observed early stopping, for some patients the outcome is nown after a reduced number of cycles. As the medical values used to assess, whether chemotherapy should be continued, are measured before each cycle, i.e. after randomisation, we need to account for time-dependent confounding Time-dependent confounding We illustrate the action of a time-dependent confounder by the ICU data example. Here, a time-dependent confounder is characterised by not only being a ris factor for nosocomial infection and a prognostic factor for the length of stay, but also by being influenced by the infection after its occurrence. A potential time-dependent confounder is the use of artificial ventilation (AV). We denote the length of stay on ICU by T. The occurrence of infection is described by the time-dependent status variable INF(t) which is 0 until the occurrence of infection and then jumps to 1. Hence, INF(t) = 1 means that the patient was infected at some time before t, but is not necessarily infected at time t. This definition encodes past infection exposure typically investigated in hospital epidemiology [9]. The information whether AV is switched on or off is given by the status variable AV(t). Part of a confounding situation is shown by figure 1.2 where we assume that values can only change on a daily basis, that the infection occurs at day, and that the patient is still in ICU at day + 1. In the first period, from day 1 until day, AV influences the ris of infection and the length of stay. In the second period, after day, the infection 5

14 1. Introduction AV() AV(+1) INF() T Figure 1.2.: Time-dependent confounding: situation where infection occurs at day and patient is still in ICU at day + 1; AV(t) = artificial ventilation status, INF(t) = infection status, T = length of stay might have an effect on the need for artificial ventilation, that is on the values AV(t) for t >. The dashed pathway from INF() over AV( + 1) towards T indicates that the value of AV( + 1) might partially represent an effect of the infection on the length of stay. The contribution of the dashed part of the arrow from AV( + 1) to T can be seen as an indirect effect of the infection. Note, that this illustrates why the inclusion of AV(t) into a statistical model is complicated. Conventional adjustment for AV(t) for values t > would either diminish the effect of the infection status or might indicate an influence of infection, even if it does not affect the length of stay. In general, time-dependent confounding is only possible, if confounder and exposure are both time-varying. Only then, the relation between confounder and exposure can be reversed. With respect to the breast cancer example, a time-dependent confounder is a prognostic factor which changes as a result of previous applications of treatment and in turn has an impact on the further application of subsequent treatment. A potential time-dependent confounder is increased leucocytes which is measured before each cycle. 6

15 2. Causal inference 2.1. Causal conclusions Statistical models serve to detect associations between different characteristics or exposure and outcome variables. Statisticians never affirmed that associations can be used to identify causal relations, but often, they are utilised with the intention to achieve a causal conclusion. The most famous example of when a statistical model detects an astonishing relation is probably that the more stors live in an area, the more babies are born. This result obviously cannot be used to prove that stors deliver babies, as the information, if the stors live on the countryside or in town is missing. It shows that statistics can only show associations and do not allow causal conclusions without further restrictions on the data. However, in daily life, associations are often enough. For the midwife who considers where to open a private practice, it does not mae a difference, if she checs which places are near the countryside or where there live more stors. This shows that association is enough for prediction in the unchanged population. Here, the emphasis is placed on the attribute unchanged. To illustrate this further, imagine that the government intervened such that bringing up children were much more attractive in town than on the countryside. Then, the midwife would need a causal conclusion as the relation between the incidence of stors and the favourable location of the private practice would change. In medicine, causal relations are required to optimise prevention and treatment of diseases, as the aim is e.g. to change medication or surgery techniques to maximise the patient s benefit. So far, only randomised experiments are overall accepted to infer on causal relations. Within the last four decades, statisticians more and more tried to deal with using statistics to answer causal questions. Inference on causal relations is only possible with well defined causal criteria and cannot 7

16 2. Causal inference do without posing restrictions on the underlying data structure, which cannot be verified by statistical experiments. This includes a careful investigation, if the causal question is appropriate. For example, we cannot just as for the effect of a reduction of the body mass index on mobility. The answer mainly depends on the way the reduction is achieved, e.g. either by dieting or by increasing activity. A good way to find out, if a causal question is applicable, is to attempt to formulate a randomised trial which answers the causal question. This also clarifies, which statement we expect from the question Causal approaches Within the last decades, the need for statistical methods to facilitate causal inference is more and more recognized by the statistical community [10, 11]. Different formal approaches for causal inference were proposed. The most prominent are graphical models, counterfactual models and structural-equations models. They are outlined and contrasted by e.g. Pearl [12] and Greenland and Brumbac [13]. Pearl [12] summarises the requirements of a causal approach. The most important conclusion is that for causal inference one requires properties of the data-generating process which are not given by the data alone, even if data were collected for the whole population. Hence, one must cope with untestable assumptions. Furthermore, this requires new mathematical notation for expressing causal relations which is not given by standard probability calculus. Aalen [14] points out that causality is a dynamic concept as the cause must precede the effect. As stated by Aalen and Frigessi [11], it appears a weaness of many models of causality, e.g. graphical models, that time is absent. To incorporate the eminent status of time, Fosen et al. [15, 16] proposed a special approach to graphical models which is called dynamic path analysis. Causal relations are often represented by directed acyclic graphs (DAGs), see e.g. [13,17]. DAGs are a good instrument to visualise causal structures which also serves to better communicate with clinicians. They indicate the statistical dependencies and independencies of the included variables. They demonstrate the consequences of conditioning on certain variables and thereby support choosing the analysing strategy. 8

17 2.3. Counterfactual models 2.3. Counterfactual models Counterfactual models, also called potential-outcome models, were established to infer on the effect of an intervention. They are based on the idea of reconstructing a randomised trial. Within the counterfactual framewor, the outcome consists of a vector whose elements are variables that are interpreted as possibly counterfactual outcome had a certain exposure state been true. For example, if exposure is either no treatment or treatment, the counterfactual outcome is given by (Y 0, Y 1 ). Here Y 0 refers to the outcome after no treatment and Y 1 to the outcome after treatment. Typically, only one of these outcomes is nown from the data. If, actually, treatment is given, Y 1 is set equal to the observed outcome whereas Y 0 is not nown and called counterfactual. This facilitates the definition of the causal effect which is given by characteristics of the distribution of counterfactuals, e.g. by E(Y 1 Y 0 ) in case of a linear treatment effect. The definition does not depend on the probability that a certain counterfactual outcome is actually observed, e.g. on the probability that treatment is given. This is the major difference between counterfactual outcomes and other framewors for causal inference. To cope with the unobservable outcome values, one needs identifying assumptions which describe the confounding mechanism. They are unverifiable and facilitate the estimation of the effect by observable variables. If the exposure is time-independent, the identifying assumption states that given the confounders X, the counterfactual outcome (Y 0, Y 1 ) is independent of the treatment assignment Z which is formally described by (Y 0, Y 1 ) Z X (2.1) where the conditional independence is denoted by. It implicitely ensures that X comprises all confounders which affect Z and the outcome. For illustration, we consider patient s health condition which affects the doctor s decision to give treatment or not, e.g. if treatment is aggressive and he supposes that it is rather suitable for patients having a better constitution. Then, if health condition is not included in X, the assignment Z does contain information on (Y 0, Y 1 ), because Y 0 as well as Y 1 tend to be better, as whatever treatment is assigned, the patient has a better prognosis due to his good condition. Note, that Z affects Y but neither Y 0 nor Y 1. However, Z indicates whether Y 0 or Y 1 can be observed. Identifiability follows from (2.1), as one can rewrite E(Y 1 ) in 9

18 2. Causal inference terms of observable variables by E(Y 1 ) = E(E(Y 1 X)) = E(E(Y 1 X, Z = 1)) = E(E(Y X, Z = 1)) and analogously perform with E(Y 0 ). Effect estimates can e.g. be obtained by the Propensity Score (PS) proposed by Rosenbaum and Rubin [18] which describes the conditional probability of treatment given the confounders. There are several procedures to use the PS, among them stratification, matching and inverse weighting by the PS. Within this thesis, we focus on time-dependent exposures and their effect on a timeconstant or survival outcome. Here, one regards not only two but several counterfactual outcomes according to each possible sequence of exposure. Furthermore, the identifying assumption is sequentially defined. It consecutively considers the time points of exposure conditioning on the respective covariate and exposure history. For these settings, Robins made proposals on two structural models, which are outlined in section 3.1, the SNFTM and the MSM. The overall strategy is to build a structural model which relates the outcome vector elements with respect to the exposure variable and thereby defines the effect of interest. Possible confounders are not included in the structural model. Their relations are considered by the estimating procedure which is deduced from the identifying assumptions such that the parameters of the structural model are causally interpretable. The SNFTM applies for longitudinal data with a time-to-event endpoint. The estimating procedure is called g-estimation where g stands for generalised treatment. The MSM can be used for various types of outcome. Quantification of the effect of a treatment regime is done by the estimation method Inverse Probability of Treatment Weighting. It can also be seen as an extension of inverse weighting by the PS transferred to time-dependent exposures. 10

19 3. Theoretical bacground 3.1. Theory concerning counterfactual framewor Within this section, we illustrate the counterfactual framewor and review the structural models SNFTM and MSM. Whereas the SNFTM only applies for survival settings, the MSM is flexible with respect to the outcome. We regard a time-varying exposure which only changes once. In our ICU example, this is the occurrence of a nosocomial infection where the exposure variable changes at the onset time of the infection. In our breast cancer example, we regard early stopping of treatment. Here, the exposure changes when the first cycle is withheld. We choose the notation and interpretation of variables according to the data examples we eep in mind Counterfactual framewor The counterfactual outcome not only comprises a single outcome variable, but a whole vector of outcome variables. Each of these variables belong to a certain exposure regime and describe the hypothetical outcome, had it been affected by this regime. First, we regard the case of a time-independent outcome Y with observed exposure described by p = ( 1, 2,..., p ) where is either 0 or 1 and only switches once. For example, exposure can be treatment applied in at most p cycles where equals 1, if cycle is given. If treatment is stopped once, it is also withheld for the rest of the cycles. Then, the counterfactual outcome is defined as the vector of random variables (Y 0, Y 1, Y 2,..., Y p ) (3.1) The component Y is interpreted as the potential outcome, if treatment was given for the first cycles. Y 0 refers to the outcome after no treatment. According to this interpretation, the observable outcome Y is lined to the counterfactuals. If treatment was never given, the observed outcome Y is set equal to Y 0. Otherwise, 11

20 3. Theoretical bacground if treatment was given for the first cycles, Y is set equal to the outcome Y. The remaining outcome variables are not observed and therefore called counterfactual. As there are many more variables defined by (3.1) than can be observed, assumptions are needed to mae the problem identifiable. These assumptions describe the confounding mechanism in the observed data. To account for the time-dependent aspects and the chronological order of cause and effect, the confounding mechanism is usually characterised by sequentially defined conditional independence assumptions which require for all possible exposure values : Y 0, Y 1, Y 2,..., Y p X, 1 (3.2) where conditional independence is denoted by. Here, we write X = (X 1,..., X ) for the covariate history prior to cycle. Baseline covariates are included in X 1. To simplify notation, we define 0 to be 0. The assumption in (3.2) is called the assumption of no unmeasured confounders. It can be interpreted in two ways. Firstly, by comparison of two patients just before cycle who were treated so far and have the same covariate history X. According to (3.2), nowing that for one of them treatment is withheld from cycle, does not enable to better predict any of the counterfactual outcomes (Y 0, Y 1, Y 2,..., Y p ). Thus, one can regard the study as a virtual sequential randomised clinical trial, where at each time the decision whether to stop treatment is taen completely randomly conditional on the nown history X. Secondly, (3.2) implicitely ensures that all information which affects stopping treatment from cycle is included in X. For example, we regard a medical factor F which influences the doctor s decision to stop treatment and which is high, if the patient has a bad prognosis. Then, nowing does improve the prediction on any counterfactual outcome Y i, as provides information on F and therefore on the probability that the patient has a bad prognosis. In case of a survival setting with observed outcome T, the counterfactual outcome can be represented by (T, T 1,..., T nmax ) if we assume that the onset of exposure can only occur at discrete times, e.g. daily, and that the onset times of exposure lie between 1 and n max. Here, we write the index for the hypothetical exposure as superscript in order to distinguish the counterfactual 12

21 3.1. Theory concerning counterfactual framewor outcome from a sequence describing a time-dependent variable. Furthermore, we write T for the outcome, if no exposure occurred. Now, the assumption of no unmeasured confounders includes conditioning on T : T, T 1,..., T nmax 1, X 1, T (3.3) Structural Nested Failure Time Model The SNFTM was proposed by Robins [19, 20] for the situation of a time-varying exposure, time-dependent confounders and a time-to-event outcome. It is explained in detail by Robins [21] and comprehensively illustrated by Keiding et al. [22]. In particular, we regard the deterministic SNFTM where a deterministic relationship between counterfactuals and observables is assumed. Robins et al. [20] and Robins [21] describe how this assumption can be relaxed. The approach was initially intended to evaluate the effect of a sequentially given treatment on an event time. Then, applications arose that addressed the effect of an intermediate event, e.g. by Keiding et al. [22, 23]. The method was further modified to fit the context of clinical trials to adjust for post-randomisation variables and compliance, e.g. by Robins [24], and illustrated by e.g. Korhonen et al. [25] and Yamaguchi and Ohashi [26]. Hernán et al. [27] elucidated the approach introducing structural models by distinguishing them from associational models and describing the estimating procedure as a reconstruction of sequentially randomised groups. Within the following sections, we explain the SNFTM in detail to illustrate its modelling ideas and to provide all necessary steps for the analysis of the ICU data (chapter 4). As we regard the case where the time-varying exposure only switches once, we describe it by I, the time of its first occurrence. Keeping in mind our ICU example, where the time-varying exposure is the onset of a nosocomial infection, we interpret I as the time from admission to nosocomial infection. Using the notation from section 1.3, I can be defined by I := inf{t : INF(t) = 1}. We further interpret the time-to-event outcome, denoted by T, as the length of stay on ICU. As is usual for hospital data, our ICU data set includes only administrative censoring due to end of documentation. Hence, we only illustrate this aspect which is, together with the modification of the estimating procedure to include censoring due to competing ris, clearly illustrated in Keiding et al. [22]. 13

22 3. Theoretical bacground Model structure The SNFTM is based on the strong version of an accelerated failure time model (as defined in e.g. Cox and Oaes [28, section 5.2.]) transferred to the context of counterfactual variables. It relates the counterfactual event time T, had no exposure occurred, to the observations T and I by the acceleration parameter exp(γ): I + (T I) exp(γ) for I < T T = (3.4) T otherwise Note, that I < T means that the exposure occurred until the terminal event happened. The model assumes that the exposure causes a transformation of the time scale by the factor exp( γ) from the time of occurrence. This results in a transformation of the time span from exposure to the counterfactual event time T. For exp(γ) lower than 1, exposure leads to later event times as compared to the case where the exposure had not occurred, cf. figure I T T (T I) exp( γ) Figure 3.1.: Accelerated failure time model: situation with prolongation effect of exposure, i.e. exp(γ) < 1 In the context of our ICU example, the time from I to T can be interpreted as the remaining time needed to recover without infection. This time is supposed to be related to the patient s health condition. The longer this time, the more impact the infection has on the patient s recovery. Note, that the accelerated failure time model in (3.4) not only lins T with T and I, but also determines a deterministic relation between any two counterfactual outcomes T and T j as (3.4) can be used to calculate T and T j from T Inference from the observable data The SNFTM is a structural model which lins T with T and I by the population parameter exp(γ). Possible confounders are not included in the SNFTM because they 14

23 3.1. Theory concerning counterfactual framewor are a characteristic of the observed data. Their relationships are accounted for by the assumption of no unmeasured confounders shown in (3.3). As the exposure is a status variable which only once jumps from 0 to 1, (3.3) is rewritten for the g-estimation method proposed by Robins [19] to derive the adequate estimating procedure for exp(γ). We review the method by first assuming that there is no censoring. To show the dependence of T on γ, we define I + (T I) exp(γ) for I < T T,γ = (3.5) T otherwise where T = T,γ 0, if γ 0 is the true population value. Being the constitutive step, the assumption of no unmeasured confounders given in (3.3) is defined in terms of the exposure hazard λ I for the time until start of the exposure, which can be interpreted as follows. λ I multiplied by a very small time interval (a few minutes, say) is the probability that the exposure starts within this small time interval under the condition that the individual was unexposed at the beginning of this time interval. One requires that given the confounder history just prior to time t, denoted by F t, the exposure hazard λ I is not affected by T,γ 0 : λ I (t F t ) = λ I (t F t, T,γ 0 ). (3.6) Note, that F t does not include T,γ 0. The covariate-information addressed in (3.6) is restricted to the time to exposure. I.e. it only involves the period where the timedependent confounders can affect the occurrence of the exposure, but are not yet affected by the exposure. This enables appropriate adjustment for time-dependent confounders. In the case where the exposure might switch at several discrete times, one rewrites (3.3) by e.g. using a pooled logistic model for the conditional probability of = 1 given the exposure and covariate history. Here, one claims that this conditional probability should not be affected by T,γ 0. Then, appropriate adjustment for time-dependent confounders is possible, as the exposure and covariate history is correctly taen into account Estimation of the SNFTM parameter The estimator of exp(γ) can be obtained by using the step-by-step algorithm called g-estimation which is explained in detail in Robins [21]. It is built on the fact that if (3.6) holds, the effect of T,γ should be 0. If the hazards in (3.6) follow Cox regression 15

24 3. Theoretical bacground models, estimation can be done via standard software. Then, regarding λ I (t F t, T,γ ), T,γ is included as a covariate. Within each step of the g-estimation algorithm, a certain value for exp(γ) is chosen. Then, T,γ is calculated from T and I for every observation using (3.5), and the Cox regression model for the exposure hazard is fitted with all confounders and T,γ. Finally, exp(γ) is chosen so that the p-value of the score test for T,γ is largest. A 95% confidence interval is obtained by including those values of exp(γ) where the p-value is greater or equal to To illustrate the g-estimation procedure, table shows an oversimplified situation with no time-dependent confounders where T,γ 0 is constantly equal to 20 and the true exp(γ 0 ) equals 0.7, i.e. exposure prolongs the time to event. For T,γ 0 = 20 and several exposure times, T was calculated by (3.5). Then, for a value below and above the true exp(γ 0 ), T,γ was calculated from T and I by (3.5). One sees that, if exp(γ) is incorrectly chosen, T,γ contains information about the exposure status. In detail, for exp(γ) = 0.6, the later the exposure starts the greater is the calculated value of T,γ. Whereas T,γ with exp(γ) = 0.8 decreases for later exposure times. This affects the estimated regressor for T,γ and its p-value. counterfactual observables calculations for fixed γ γ 0 T,γ 0 I T T,γ, exp(γ) = 0.6 T,γ, exp(γ) = Table 3.1.: Illustration of values regarded within g-estimation procedure for true exp(γ 0 ) = Administrative censoring due to end of documentation In this section, we describe how to cope with administrative censoring due to end of documentation. Here, the potential censoring time C is already nown for every patient on admission. The idea is to replace T,γ 0 in (3.6) by a function of T,γ 0, C, γ 0 and t: λ I (t F t ) = λ I (t F t, f(t,γ 0, C, γ 0, t)). (3.7) 16

25 3.1. Theory concerning counterfactual framewor As C is already nown on admission, it is included in F 0. Hence, if γ = γ 0, (3.7) follows from (3.6). The aim is to choose f(t,γ 0, C, γ 0, t) such that it can be determined for every observation in the ris set at each time t whether it is censored or not independently of the onset of exposure I. The focus here is not on the possibly censored outcome min(t, C), but on the possibly censored value of T,γ which belongs to a time scale transformed by exp(γ) from the onset of exposure. Therefore, in the case of exp(γ) < 1, we consider a transformation of C which is defined time-dependently as follows: C γ (s) = s + (C s) exp(γ). Then, as C γ (s) increases with s and C γ (t) < C γ (s) for s ]t; C], min(t,γ, C γ (t)) (3.8) is our candidate for f(t,γ 0, C, γ 0, t) as it can be determined for every observation whether the outcome T is censored or not. Three other functions f(t,γ 0, C, t) are discussed in Keiding et al. [22]. We chose the one which fits most to our interpretational line. The definition of C γ (t) is such that sequentially increasing censoring times are generated. Thereby, artificial censoring which changes over time is introduced. Considering ris sets at time t, the lower t the more artificial censoring is introduced. Note, that the prolongation and hence the artificial censoring mechanism depends on the acceleration factor exp(γ). As for exp(γ) > 1, C γ (s) decreases with s, the constant min(t,γ 0, C) can be chosen for f(t,γ 0, C, γ 0, t) in this case Algorithm by Robins to simulate data conform with SNFTM A data generating process, typically used for a SNFTM was proposed by Robins [21] and further applied by Young et al. [29]. Here, individual patient data is generated separately for each subject. A counterfactual event time is sampled first. Then, a separate process is used to sample the interaction of exposure and covariates. The impact of the confounders on the outcome is achieved by relating their distribution to the initially sampled counterfactual outcome. The process is stopped at the observed event 17

26 3. Theoretical bacground time which is obtained by subsequently transforming the counterfactual time according to the already sampled exposure values. For our situation, considering the time-dependent confounder CONF(t) and the start of exposure described by the time-dependent status variable EXP(t), which is 0 until the start of exposure and then jumps to 1, it applies as follows Step 1: Simulate the counterfactual event time T from a failure time distribution (e.g. Exponential or Weibull) with hazard λ 0 (t). Step 2: Simulate the time of the next change in either CONF(t) or EXP(t), denoted by K, where the time to change is modeled by a failure time distribution with hazard λ CONF (t) and λ EXP (t), respectively. Here, λ CONF (t) should depend on T and the history of EXP(t) in order that CONF(t) is a confounder and affected by the exposure. Furthermore, λ EXP (t) should depend on the history of CONF(t) such that CONF(t) acts as confounder and must exclude T such that the assumption of no unmeasured confounders is satisfied. Step 3: Determine T as follows. If EXP(t) already jumped to 1, calculate T using (3.4), i.e. by T = I + (T I)/ exp(γ), where I = inf{t : EXP(t) = 1}. Otherwise, set T equal to T. If K < T repeat steps 2 and 3. Otherwise, redefine CONF(K) :=CONF(K dt) and EXP(K) :=EXP(K dt) such that no change of CONF(t) and EXP(t) happened at K Marginal Structural Models Marginal Structural Models (MSMs) were proposed by Robins [30,31] to quantify the effect of a certain treatment regime. They adequately address time-dependent confounders and apply for different types of outcome including time-to-event outcomes. In chapter 6, we address a time-constant binary outcome. A treatment regime consists of a treatment plan which specifies how to determine the treatment dose at the time-points where treatment is given. There are two types of treatment regimes, static and dynamic regimes. In static treatment regimes the application doses are determined at baseline and do not change in response to medical history of the individual patient. Dynamic treatment regimes provide rules to assess the 18

27 3.1. Theory concerning counterfactual framewor application dose in dependence of certain measurements of clinical variables available at the respective time-point. The MSM applies to data where a dynamic treatment regime was used, but infers on the causal effect of a static regime by comparing potential cases where different static treatment regimes were used for all patients. In the meantime, this approach was applied to certain data, e.g. by Mortimer et al. [32], Cole et al. [33] and Tager et al. [34]. In the setting of randomised clinical trials, non-compliance can also be seen as a dynamic strategy which allows for stopping treatment at a certain timepoint. In recent years, there have been proposals to adequately estimate the treatment effect for different endpoints by structural models, e.g. by Robins [24], Vansteelandt and Goetghebeur [35] and Loeys et al. [36] Model structure The MSM is a structural model that determines the relation of the distribution of the counterfactual outcomes and the hypothetical exposure. The model is chosen according to the type of outcome. For a counterfactual survival outcome T where describes the onset of exposure at time, a marginal structural Cox proportional hazards model λ T (t) = λ 0 (t) exp(β 1 {<t} ) applies where the hazard might further depend on baseline covariates. For a time-constant binary outcome Y, one uses logit P (Y = 1) = β 0 + β 1 (3.9) In contrast to a standard statistical model of associational form, e.g. logit P (Y = 1 N) = α 0 + α 1 N (3.10) where the outcome variable is a conditional probability and N a random variable, the structural model addresses the counterfactual variables Y and the dependent variable is not random Inference from the observable data As the MSM is a structural model, it does not include confounding variables. The confounding mechanism is defined by the assumption of no unmeasured confounders 19

28 3. Theoretical bacground which is shown in (3.2) for a time-independent outcome. It is subsequently applied to rewrite the marginal probability of the counterfactual outcomes in terms of probabilities of observable variables, as shown in (6.2) in chapter 6. Then, appropriate weights, which correspond to the dependence of confounders and the occurrence of the exposure, are read off this term. By reweighting the observations by these weights, a pseudopopulation is created where time-dependent confounding is eliminated. This means, within this pseudo-population, the probability to receive a certain treatment regime is constant for all observations, i.e. treatment is unconfounded Estimation of the MSM parameters Estimation of the MSM parameters is done in two steps using the method of Inverse Probability of Treatment Weighting (IPTW). First, an adequate statistical model is chosen and fitted for the weights. Then, one uses the standard statistical model which corresponds to the MSM, e.g. the standard logistic model in (3.10) for the MSM in (3.9), and estimates its parameters using the pseudo-population obtained by reweighting. As in the pseudo-population treatment is unconfounded, they are consistent for the parameters of the MSM. Thus, the MSM parameters are determined by fitting the weighted regression model chosen according to the considered outcome. Stabilised weights can be used to obtain more efficient estimates. The consistency of the IPTW estimator relies on the positivity assumption P ( m 1 = δ m 1, X m = x m ) > 0 P ( m = δ m m 1 = δ m 1, X m = x m ) > 0 (3.11) for all possible δ m and x m, which is also called the assumption of experimental treatment assignment [32,37]. It claims that at every level of the confounder history measured just before cycle m, there is a positive probability of receiving the next cycle m and stopping after cycle m 1, respectively. The MSM is very flexible, as both the structural model and the model for the weights can be any statistical model. For most of the regression models, standard statistical software pacages provide a weighted fit. In case of a survival outcome, time-dependent weights are used [38]. As most standard Cox model software programs do not allow for subject-specific time-varying weights, Hernán et al. [38] recommends to fit a weighted pooled logistic regression. 20

29 3.1. Theory concerning counterfactual framewor Contrasting SNFTM and MSM The SNFTM is restricted to a time-to-event outcome and describes a deterministic relation between the counterfactual outcomes refering to different hypothetical exposure regimes given by T = + (T ) exp(γ) assuming an acceleration effect. This equation allows to calculate T from the observations T and I for exposed patients. For unexposed patients, T equals the observed outcome T. The MSM applies to almost any type of outcome and describes the relation of a characteristic of the marginal distribution of counterfactuals. In case of a survival outcome, the marginal structural Cox model applies which assumes a multiplicative effect of the exposure on the hazard of the counterfactual outcome. With respect to the interpretation of the causal parameters, it is easier to transfer an epidemiological hypothesis to the parameters of a SNFTM than to those of marginal structural Cox models. In both models, estimation from observational data is done by utilizing the assumption of no unmeasured confounders, given by (3.3), which claims conditional independence of counterfactual outcome and observed exposure. But, they differently transfer it to deduce the estimation method. The SNFTM focusses on a model for the observed exposure where the counterfactual T is included as dependent variable. To obtain the SNFTM parameter, this model is fitted by the original data such that T contributes no effect. The MSM uses a model for the observed exposure conditional on exposure and covariate history without inclusion of the outcome to generate weights. These weights are used to transform the original data into a pseudo-population by reweighting where treatment application is unconfounded. The parameters of the marginal structural model are then estimated by fitting the corresponding associational model using this pseudopopulation. The way of utilizing the assumption of no unmeasured confounders to establish the estimation method induces that MSMs cannot be modified to address the effect of a dynamic treatment regime. With regards to the SNFTM and g-estimation, the extension is complicated but feasible by introducing interaction terms between timedependent treatment and covariates in the acceleration model. In contrast to the MSM approach, the SNFTM is not restricted by the positivity assumption given in (3.11). A big advantage of MSMs, however, is that in contrast to the 21

30 3. Theoretical bacground SNFTM, estimation can typically be done by standard statistical software pacages Theory concerning observable framewor Multistate models Multistate models [39] are used to describe longitudinal failure time data by a process which at any time occupies one of a few possible states. A transition between the states is called an event. Figure 3.2 shows a simple multistate model which describes a patient s stay on ICU which possibly involves the occurrence of a nosocomial infection. At admission to ICU, the patient starts uninfected in state 0. The arrows indicate that from state 0 he can either move to state 1 or 2, depending on whether he is infected or discharged without being infected. State 1 is a transient state which the patient passes and then moves on. State 2 is called an absorbing state, as the patient finally remains in that state. Figure 3.2.: Multistate model: description of ICU stay with possible nosocomial infection; denotation of states: 0 = uninfected, 1 = infection, 2 = discharge Note, that changes in discrete time-dependent covariates can also be modelled as events. Here, different states according to all possible values are defined. Multistate models are characterised by hazard rates α ij (t; F t ) = lim t 0 also called transition intensities, where P ij (t, t + t), i j (3.12) t P ij (s, t) = P (state j at time t state i at time s, F s ) 22

31 3.2. Theory concerning observable framewor is the probability of being in state j at time t conditional on having been in state i at time s and on the covariate history F s just before s Inverse Probability of Censoring Weighting The method of Inverse Probability of Censoring Weighting (IPCW), see e.g. Robins et al. [40, 41, 42], is used to include censored observations in the statistical analysis. The idea is that censored observations are replaced by observations with similar covariate history up to the censoring time. For this purpose, uncensored patients are weighted by the probability of not being censored where the weights are deduced from assumptions on the censoring mechanism. A basic approach which uses IPCW is the Kaplan-Meier estimator [43] for the survival curve. Here, one assumes that censoring is independent of any covariates and the outcome. Thus, if there are n observations at time t of which one is censored, the remaining uncensored observations are reweighted equally by n/(n 1) from t. If censoring depends on covariates, the censoring mechanism is characterised by identifying assumptions which explain conditional independencies between covariates, missingness and the outcome. Then, one first sets up a model which explains the relations between covariates and the occurrence of missing outcome values. The parameters of this model are estimated according to the identifying assumptions. Then, one calculates the weights by this model and uses the reweighted population for estimating the parameter of interest. If censoring only depends on baseline covariates or the outcome of interest is time-independent, the reweighted population reduces to the uncensored observations. If censoring further depends on time-dependent covariates and one regards a time-to-event outcome, the censored observations are included in the reweighted population until their censoring time and the weights vary over time. 23

32

33 4. Applying the SNFTM to assess the effect of a nosocomial infection on the length of hospital stay In this chapter, we address the effect of a nosocomial infection on the length of hospital stay and regard our example on ICU data illustrated in section Quantifying the effect is complicated because infection status changes over time. Common, but inadequate adhoc approaches tend to overestimation. A suitable method is based on a time-to-event approach and accounts for patient characteristics. We apply the SNFTM as illustrated in section to quantify the effect of a nosocomial infection on the length of hospital stay by the SNFTM parameter. This was already done by Schulgen et al. [44,45]. There, it was part of a comparison of different approaches and the intention was to demonstrate its applicability to real data. One innovative advantage of Robins method is to include time-dependent confounders in an appropriate way. This property, however, was not utilised by Schulgen et al. [44, 45], as no covariates were included in the analysis. We now use this model in its full capacity and reanalyse the data. Furthermore, we address the effect of the nosocomial infection with respect to the change in length of hospital stay where the extra stay is measured in days. This is considered to be a relevant quantity in the field of infection control. It is e.g. used to increase the efficacy of resource planning and to assess additional expenses due to the infection. We use the definition given by Schulgen et al. [44, 45]. Here, the counterfactual framewor facilitates the definition of the effect and its interpretation. The extra stay is given by a plug-in estimator which uses the SNFTM parameter. In accordance with the literature [9, 46], we focus on the end of stay, i.e. equally consider death and discharge. Our data example consists of 1656 admissions with only 10% deaths. 25

34 4. SNFTM: Effect of nosocomial infection on length of hospital stay 4.1. Definition and estimation of the extra stay Estimation of the extra stay from observational data is not straightforward. An adhoc approach to quantify the change in length of stay is to define two groups by retrospectively dividing the patients according to whether they have acquired a nosocomial infection or not. The change in length of stay is estimated as the difference in the mean length of stay. As patients who stay longer on ICU have a higher ris of entering the group of infected patients, this comparison tends to overestimate the effect. To assess the extra stay due to the nosocomial infection, we use a quantity proposed by Schulgen et al. [44, 45]. They compare different definitions of the extra stay in different framewors. We focus on the term which determines the extra stay by comprising both observable and counterfactual variables: E(T T I < T ) (4.1) The big advantage of this term using the counterfactual framewor is that it clearly states the type of comparison. It represents the medically optimal comparison to elimination of the infection and describes the mean change in length of stay of an infected patient from the considered population. It facilitates to deduce the overall number of extra days, which could have been saved by complete elimination of nosocomial pneumonia, by multiplying this quantity by the number of infected patients. Estimation of the quantity in (4.1) is done as follows. We use the whole population to estimate the SNFTM parameter. Then, this estimate is used to calculate T for the infected population from T and I by (3.4). To estimate the expectation, we use integrals of the Kaplan-Meier curve to include censored observations. A confidence interval is obtained by drawing bootstrap samples from the whole analysis set, where each step includes the estimation of the SNFTM parameter Application: SIR3 study Information on the ICU data The data of the SIR3 study were collected over a period of 18 months to examine the effect of nosocomial infections, covering all patients admitted to five intensive care units (see table for types of ICU and number of admissions per ICU) who stayed more 26

35 4.2. Application: SIR3 study than 48 hours. A detailed description of the data is given in Grundmann et al. [6] and Beyersmann et al. [7]. Type of ICU Frequency % Neurosurgical Surgical Interdisciplinary, Unit I Interdisciplinary, Unit II Medical All Table 4.1.: Types of ICU centers and number of admissions by ICU We focus on nosocomial pneumonia which is one of the most frequent and severe nosocomial infections. It is considered to be hospital-acquired, if it occurred more than 48 hours after admission. We regard past infection exposure which means that the infection status remains 1 after infection, even if the infection was cured. This is typically investigated in hospital epidemiology [9]. We exclude 220 patients who already had pneumonia on admission which leaves 1656 admissions with ICU stay longer than 48 hours. Information on the occurrence of nosocomial pneumonia and the terminal event are given in table Discharged Died Censored All Nosocomial pneumonia No nosocomial pneumonia All Table 4.2.: Information on occurrence of nosocomial pneumonia and terminal event Possible prognostic factors for the ris of infection and the use of medical devices are listed in table which also contains information on the values of the baseline characteristics in our data. They comprise the baseline covariates age, sex and information on patient s health status at admission by the SAPS II Score [47], hospital stay before admission to ICU and further admission status. Additionally, information on the use 27

36 4. SNFTM: Effect of nosocomial infection on length of hospital stay of artificial ventilation, a chest drainage, a nasogastric tube and a urinary catheter, respectively, is accounted for. For all devices there is daily information on being on or off. They might act as time-dependent confounders with respect to the influence of nosocomial pneumonia on the length of hospital stay. Descriptive Multivariate Statistics Analysis Frequency Hazard Prognostic factor absolute % Ratio 95% CI p-value * Baseline characteristic Age a (years) [0.99;1.01] 0.56 Sex (female) not included SAPS II score a [0.98;1.00] 0.09 Intubation on admission to ICU [0.60;1.53] 0.86 Hospital stay before ICU admission [0.55;1.15] 0.23 Surgical patient not included Elective surgery before ICU admission [1.11;2.43] 0.01 Emergency surgery before ICU admission [0.97;2.10] 0.07 Neurological underlying disease not included Metabolic or renal underlying disease [0.34;1.62] 0.46 Time-dependent status variables Use of artificial ventilation b [2.76;8.15] 0.01 Use of chest drainage b [0.68;1.79] 0.69 Use of nasogastric tube b [1.94;9.29] 0.01 Use of urinary catheter b [0.61;2.55] 0.54 * Wald test (two-sided) Descriptive Statistics: Mean and standard deviation b Descriptive Statistics: With regard to use at least once Table 4.3.: Descriptive statistics of prognostic factors and results of multivariate Cox regression for hazard of infection Estimation of the SNFTM parameter To estimate the SNFTM parameter, we follow the description of the estimation method given in section To set up the appropriate Cox model for the infection hazard, 28

37 4.2. Application: SIR3 study we first test the influence of covariates by univariate analyses. We then choose those factors for the multivariate Cox model which showed an unadjusted p-value of p in accordance with the Aaie criterion [48]. We further stratify this Cox model by ICU center to account for distinct patient collectives. Parameter estimates are given in table Applying g-estimation, we obtain an acceleration parameter exp(γ) of with 95% confidence interval of [0.633; 0.877]. This indicates that infection prolongs ICU stay Estimation of the extra stay Using the quantity given in (4.1), the estimate of the extra stay for the population addressed in the SIR3 study results in 5.71 extra days (95% CI [2.70; 8.56]). The overall number of extra days, which could have been saved by complete elimination of nosocomial pneumonia, results in = days (95% CI [423.9; ]) Artificial ventilation as time-dependent confounder We now provide arguments that artificial ventilation (AV) acts as time-dependent confounder as done by Keiding et al. [22] regarding a data example on Graft versus Host Disease. First, we assess the impact of AV on the occurrence of infection and on the outcome. Therefore, we address the results in table which show a significant effect of AV on the infection hazard. Furthermore, a Cox model for the time until end of stay including all covariates listed in table and the status variable INF(t), defined in section 1.3, showed that AV(t) has a prolongation effect on length of stay (hazard ratio 0.338, 95% CI [0.29; 0.39]). This indicates that AV must be treated as confounder. Secondly, we investigate, if the need for AV is associated with prior infection, i.e. whether AV is a confounder which is time-dependent. Therefore, we fit a univariate Cox regression model for the hazard of first use of AV stratified by ICU center where the infection status enters as time-dependent variable. 877 patients where AV was not already used on admission entered in this analysis and contributed 75 events. The estimate of the hazard ratio for infection status was 10.9 (95% CI [2.46; 48.8]). This shows, that AV is a time-dependent confounder. 29

38

39 5. Multistate model conform with assumptions of SNFTM In this chapter, we propose a multistate model which describes the situation with a time-varying exposure, a time-dependent confounder and a survival outcome. It is defined such that it is conform with the assumptions of the SNFTM and the acceleration parameter of the SNFTM enters directly. It only includes observable data which arises in the appropriate chronological order. The survival time becomes manifest when the patient enters the final absorbing state. The interaction of covariates and exposure is modelled by direct effects relating the respective transition rates. The appropriate embedding in the counterfactual framewor is achieved by a partially delayed impact of covariates on the terminal event, if they changed after exposition. To illustrate our modelling assumptions, we characterise one transition by a mixture of transition probabilities from hidden substates. These hidden substates subclassify one of the observable states. The probabilities of transitions between observable states do not depend on the hidden substates. In contrast to counterfactuals, the substates are hidden but do not refer to coexistent hypothetical variables which are coexistent in a hypothetical world but never observable simultaneously for one individual. For illustration, we use the ICU example, introduced in section 1.2.1, with artificial ventilation (AV) as time-dependent confounder and the infection as time-varying exposure. We assume a prolongation effect, i.e. exp(γ) < 1. To focus on the relevant modelling aspects, we regard AV as status variable which remains one from the first use of AV. AV can then be interpreted as an indicator for health status. Subsequently, we explain how to modify the model to include the event switch off AV. Furthermore, the concept can easily be adopted to exp(γ) > 1 by interchanging uninfected and infected patients. 31

40 5. Multistate model conform with assumptions of SNFTM 5.1. Idea to model the action of a time-dependent confounder The time-dependent confounder AV is affected by the infection and influences the time to discharge. Our aim is to model the impact of AV on the discharge hazard such that the causal parameter exp(γ) of the SNFTM, which covers the direct and indirect effects of the infection on discharge, enters directly. Therefore, we model hidden states according to the reason of AV and distinguish between AV due to INF and AV not due to INF. If the infection was the reason for AV, we model no additional effect of AV on the discharge hazard. The hidden states are a means to illustrate the modelling assumptions for the determination of the discharge probability for infected patients. It is defined as a mixture of discharge probabilities from the hidden states. It does not depend on the pathway along the hidden states but only on the observable information, if the patient was infected before AV was switched on or afterwards Definition of the multistate model A general definition of a multistate model is given in section denoting the transition hazards by α ij (t; F t ) in (3.12). Our multistate model with states defined by infection status (INF), AV and by discharge status (including death) is shown in figure 5.1. We assume that every patient is uninfected and not ventilated on admission, i.e. starts in state 1. Then, he moves along the transient states 1 to 4 according to the indicated arrows until he is discharged, i.e. finally reaches the absorbing state 5. Thus, if the patient is discharged after infection without being ventilated, he moves along the path from 1 to 3 and then to 5. If he is ventillated after admission and subsequently acquires the infection before discharge, he follows the path Comparing the transition hazards for switching from 2 to 4 and from 1 to 3 indicates whether AV increases the ris of infection. In the following sections, we characterise the impact of AV and INF by relating the respective transition hazards. Thereby, we allow arbitrary transition hazards for switching from the uninfected and unventilated state 1, i.e. for α 12 (t), α 13 (t) and α 15 (t). Here and 32

41 5.2. Definition of the multistate model Figure 5.1.: Multistate model: description of states and possible transitions further on, we either suppress the dependence of the transition hazards on the covariate history F t in the notation or single out the essential ingredient. To model effects on transitions to transient states, we assume proportional hazards, as typically done in multistate models. We equally proceed with the discharge hazard for uninfected patients. However, with respect to discharge hazards for infected patients, we need to incorporate the modelling assumptions of the SNFTM. A formal chec that our multistate model meets the assumption of no unmeasured confounders is given in appendix B Effect of INF on AV and of AV on INF and discharge We define the effect of INF on AV and of AV on INF and discharge as follows assuming proportional hazards: α 34 (t) = α 12 (t) c INFAV (5.1) α 24 (t) = α 13 (t) c AVINF α 25 (t) = α 15 (t) c AVDIS (5.2) 33

42 5. Multistate model conform with assumptions of SNFTM with constants c INFAV 1, c AVINF 1 and c AVDIS 1. Their range is chosen such that the infection increases the need for AV and AV increases the ris of INF and decreases the ris of discharge Discharge hazard for infected, unventilated patients In order that the multistate model complies with the SNFTM, we model the influence of INF on discharge for patients in state 3 by an accelerated failure time model [28, section 5.2.]: α 35 (t; ) = α 15 (t γ, ) exp(γ) with the time of infection and the bactransformed time t γ, = + (t ) exp(γ) (5.3) Recall that according to (3.4), T = T γ,i Discharge hazard for infected, ventilated patients When defining the discharge hazard for patients in state 4, we have to face two challenges. First, we must realise that exp(γ) covers the overall impact of INF on the time to discharge and, second, we must introduce AV as time-dependent confounder. In order that exp(γ) represents the causal parameter of the SNFTM, the influence of AV on the discharge hazard cannot be modelled straightforwardly for patients in state 4 which were infected before first use of AV, i.e. came from state 3. This is due to the fact that the effects are modelled with respect to the transition hazards which can be illustrated for c INFAV = 1 as follows. The effect of INF on AV describes the ris ratio of AV of presently infected and uninfected patients. It corresponds to the ris within the next infinitesimal time span, but not to the overall effect until discharge. As, in the regarded situation, α 34 (t) = α 12 (t), we say that the infection does not increase the ris of AV. This addresses the direct effect of infection. As, for the infected patient in state 3, discharge and first use of AV are two competing events, the infection also has an indirect effect on AV, if it reduces the discharge hazard. Then, in comparison to a world without infection, the patient stays longer at ris of AV which increases his ris of being ventilated during his stay. 34

43 5.2. Definition of the multistate model Modelling aspects of SNFTM and multistate model Before explaining our modelling strategy, we first reflect the different modelling aspects of the SNFTM and the multistate model. By the SNFTM, T and I are lined with T by (3.4) without taing account of further covariates. Within the multistate model, T and I arise progressively depending on the covariate history. For uninfected patients, T can be observed, as then T = T. For infected patients, T can be calculated from T and I by (3.4) using exp(γ). We regard two different time scales: the time scale of T and the time scale of T. For uninfected patients, they do not differ. For patients infected from time, they are equal until and then lined by the transformation t γ, = + (t ) exp(γ) as in (5.3) where t corresponds to the time scale of T and t γ, to the time scale of T. Dependencies between covariates, I and T are governed by the assumption of no unmeasured confounders. It claims that for every time t, the distribution of T must be the same for patients with the same covariate history until t whether they get infected or not at time t. We achieve that the multistate model complies with this assumption by modelling the influence on the discharge hazard of changes in covariates after infection with respect to T such that on average their influence is comparable whether the patient is infected or not. We first address the case c INFAV = 1, where the infection has no impact on AV. Then, we consider c INFAV > 1 and further introduce AV as time-dependent confounder following section Modelling the influence of the infection on discharge for c INFAV = 1 In this section, we regard the case c INFAV = 1, i.e. the infection only influences the discharge hazards and AV does not act as time-dependent confounder. Here, the infection does not influence the transition hazard from 3 to 4. But with respect to the time scale of T, this happens earlier than in a world without infection. Therefore, we define a possibly delayed influence of AV on the discharge hazard such that the effect of AV switched on after infection does not start until a random time which depends on the probability for switching on AV and on the acceleration factor exp(γ). The delayed influence of AV compensates the fact that, in comparison to a world without infection, the infected patient is on average ventilated more often and earlier with respect to T. The idea is that, if the transition hazard from 3 to 4 were equal to the transformed 35

44 5. Multistate model conform with assumptions of SNFTM transition hazard from 1 to 2, namely α 12 (t γ, ) exp(γ), the impact of AV on T would not differ between infected and uninfected patients. Thus, we rewrite the transition hazard from 3 to 4 by adding zero as α 34 (t) = α 12 (t γ, ) exp(γ) + (α 12 (t) α 12 (t γ, ) exp(γ)) To illustrate our modelling strategy, we subclassify state 4 by the hidden substates A and B with transition hazards α 3A (t; ) = α 12 (t γ, ) exp(γ) (5.4) α 3B (t; ) = α 12 (t) α 12 (t γ, ) exp(γ) (5.5) where A and B can be interpreted as follows: A: AV not due to INF, immediate impact of AV B: AV not due to INF, no impact of AV The corresponding discharge hazards are chosen such that the discharge hazard from A is influenced by INF and AV whereas the discharge hazard from B is only affected by INF: α A5 (t; ) = α 15 (t γ, ) exp(γ) c AVDIS α B5 (t; ) = α 15 (t γ, ) exp(γ) with c AVDIS already used in (5.2). Note, that α A5 (t; ) = α 25 (t γ, ) exp(γ). The patient in substate A can only switch to discharge, whereas from B, he can either switch to A or 5, as shown in figure 5.2. Note that although AV = 1 for a patient in B, the impact of AV on the discharge hazard does not start until he moved to A. Thus, after transition from 3 to 4 at time s, with probability α 12 (s γ, ) exp(γ)/α 12 (s) the impact of AV starts immediately. With probability (α 12 (s) α 12 (s γ, ) exp(γ))/α 12 (s), it only starts at a random later time. In order that in total it starts on average according to the hazard α 12 (t γ, ) exp(γ), we define the transition hazard from B to A by α BA (t; ) = α 12 (t γ, ) exp(γ) To characterise the transition from 4 to 5, we determine the survival function as a mixture of transition probabilities with respect to the possible pathways along the hidden 36

45 5.2. Definition of the multistate model Figure 5.2.: State 4 subclassified by hidden substates and possible transitions between them. Transition to state 5 (discharge) possible from every hidden substate. A and B only in case c INFAV > 1. substates to 5. As well as hazards, the survival function can be used to define the distribution of a survival time. We obtain for t s where 3 s 4 denotes the event that the patient switched from 3 to 4 at time s. S 45 (t I =, 3 s 4) = = P (3 s A 3 s 4) P (in A until t I =, 3 s A) + [ +P (3 s B 3 s 4) P (in B until t I =, 3 s B) + ] +P (B A, in A until t I =, 3 s B) = = (α 12 (s γ, ) exp(γ)/α 12 (s)) exp( t +((α 12 (s) α 12 (s γ, ) exp(γ))/α 12 (s)) + t s exp( v s s α A5 (u; )du) + ( exp( t (α BA (u; ) + α B5 (u; ))du) α BA (v; ) exp( s (α BA (u; ) + α B5 (u; ))du) + t v ) α A5 (u; )du) If the patient switched from 2 to 4, the problem with AV having a different impact on T and T does not arise and we define that he can only switch to A. Thus, S 45 (t 2 s 4) = exp( t s α A5 (u; )du) 37

46 5. Multistate model conform with assumptions of SNFTM We see that the transition from state 4 to 5 does not depend on the pathway along the hidden substates but only on the observable information, if the patient switched to state 4 from state 2 or from state 3. As the transition hazards must be positive, this modelling strategy delimits the possible transition hazards α 12 (t) to those where for all t α 12 (t) > α 12 (t γ, ) exp(γ) This includes for example monotone increasing and constant α 12 (t). Figure 5.3.: Partially delayed impact of AV after infection shown with respect to T and T for comparison to impact of AV without infection Figure 5.3 illustrates the impact of AV. Case 1) shows an uninfected patient which is ventilated from time s and discharged at T. Case 2) shows the same patient, had he been infected at time I. He was also ventilated from time s, but the influence of AV had started immediately only with probability α 12 (s γ, ) exp(γ)/α 12 (s) and was delayed otherwise. To illustrate the delayed influence, we show quantiles of the starting points. Furthermore, we bactransformed the starting points to illustrate the influence of AV with respect to T Increased ris of AV due to infection: c INFAV > 1 Now, we address the second challenge and introduce AV as time-dependent confounder, i.e. regard c INFAV > 1, following the ideas of section 5.1. Here, the infection affects the 38

47 5.2. Definition of the multistate model ris of AV, which means that there is a proportion of patients in state 4 which get AV due to the infection. Furthermore, they remain under ris of getting AV for other reasons. We consider this by subclassifying state 4 further into hidden substates and obtain with the substates A and B from above: A: AV not due to INF, immediate impact of AV B: AV not due to INF, no impact of AV B : AV due to INF A : AV due to INF and due to other reasons The division into hidden substates enables to define different discharge hazards concerning the reason for AV. In order that the causal parameter exp(γ) represents all direct and indirect effects of the infection, AV due to INF does not have an impact on the discharge hazard. We denote the additional substates by A and B to point out by similarity that being in A or A means that AV has an impact on discharge, whereas being in B or B the discharge hazard is not affected by AV. We set α B 5(t; ) = α 15 (t γ, ) exp(γ) α A 5(t; ) = α 15 (t γ, ) exp(γ) c AVDIS The substates are reached as follows depending on if the patient switched from state 2 or 3 to 4. The patient who switches from 2 to 4 reaches state A, as AV was switched on before infection. If the patient switches from state 3 to 4, he either reaches A, B or B where we define α 3A (t; ) and α 3B (t; ) as in (5.4) and (5.5) and set α 3B (t) = α 12 (t) (c INFAV 1) (5.6) as a consequence from (5.1) as α 3A (t; )+α 3B (t; ) = α 12 (t). State A cannot be reached directly from 2 or 3, as otherwise, two events happened at the same instant of time. Possible transitions between the hidden substates are shown in figure 5.2. The respective transition hazards are lined such that the ris of AV due to INF and due to other reasons remains comparable between already ventilated and unventilated patients. Together with (5.4) and (5.6) we obtain α B A (t; ) = α 3A(t; ) = α 12 (t γ, ) exp(γ) α AA (t) = α 3B (t) = α 12 (t) (c INFAV 1) 39

48 5. Multistate model conform with assumptions of SNFTM Now, we are able to characterise the transition from state 4 to 5. Again, it does not depend on the pathway along the hidden substates but only on the observable information, if the patient switched from state 2 to 4 or from state 3. Patients who change from state 2 to 4 at s reach state A. They cannot pass B or B and at most later switch to A. As the discharge hazard from A and A is identical and there are no other possible pathways, we obtain S 45 (t 2 s 4) = exp( as in the case above with c INFAV = 1. t If a patient switches from 3 to 4, we obtain for t s: s α A5 (u; )du) S 45 (t I =, 3 s 4) = [ = P (3 s A 3 s 4) P (in A until t I =, 3 s A) + ] +P (A A, in A until t I =, 3 s A) + [ +P (3 s B 3 s 4) P (in B until t I =, 3 s B) + +P (B A, in A until t I =, 3 s B) + ] +P (B A, A A, in A until t I =, 3 s B) + [ +P (3 s B 3 s 4) P (in B until t I =, 3 s B ) + ] +P (B A, in A until t I =, 3 s B ) As the hidden substates B and B only differ for interpretational but not for mathematical aspects, this reduces to S 45 (t I =, 3 s 4) = [ = P (3 s A 3 s 4) P (in A until t I =, 3 s A) + ] +P (A A, in A until t I =, 3 s A) + + ( P (3 s B 3 s 4) + P (3 s B 3 s 4) ) [ P (in B until t I =, 3 s B ) + ] +P (B A, in A until t I =, 3 s B ) = 40

49 = α 12 (s γ, ) exp(γ)/(α 12 (s) c INFAV ) [ exp( t s ] α A5 du) + +(α 12 (s) c INFAV α 12 (s γ, ) exp(γ))/(α 12 (s) c INFAV ) [ ( t ( exp α12 (u γ, ) exp(γ) + α 35 (u) ) du) + + t s s ( exp exp( v s t v 5.3. Simulation study ( α12 (u γ, ) exp(γ) + α 35 (u) ) du) α 12 (v γ, ) exp(γ) ] α A5 du)dv We evaluate S 45 (t I =, 3 s 4) for constant transition hazards α 12, α 13 and α 15. Then, all transition hazards between observable states are constant except the transition hazard from 4 to 5. We obtain S 45 (t I =, 3 s 4) = [ ] = (exp(γ)/c INFAV ) exp( (t s) α A5 ) + [ +(c INFAV exp(γ))/c INFAV exp( (t s) (α 12 exp(γ) + α B5 )) + + exp(s (α 12 exp(γ) + α B5 )) α 12 exp(γ) exp( t α A5 ) ( (1/(α 12 exp(γ) + α B5 α A5 )) exp( s (α 12 exp(γ) + α B5 α A5 )) )] exp( t (α 12 exp(γ) + α B5 α A5 ) Note, that we might have introduced a further substate C to model AV due to INF and due to other reasons with delayed impact of AV with possible transitions B C, B C and C A. But as this would not affect the transition probability from state 4 to state 5 and as transition hazards between substates do not need to be lined due to interpretational reasons, e.g. α B A (t; ) does not need to be equal to α 12(t), we did not expand our model by a further substate Simulation study Now, we use our simple joint model as simulation model using constant transition hazards α 12, α 13 and α 15. With the generated data, we compare the length of stay of infected and uninfected patients and explore the behaviour of a typically used Cox model in comparison to the SNFTM. 41

50 5. Multistate model conform with assumptions of SNFTM Data generation We choose a sample size of 1000 patients and perform 100 simulation runs per setting. According to the multistate model proposed above, we simulate individual patient data which describe the observable states and transition times until the absorbing state 5 (discharge) is reached. We assume that every patient is uninfected and not ventilated on admission and choose state 1 as the overall initial state. Then, we subsequently sample the next transition time. Therefore, we consider all possible transitions as competing riss and use the simulation algorithm presented by Beyersmann et al. [49]. More precisely, we generate an exponentially random time distributed according to the sum of the rates corresponding to all possible transitions. In case of time-dependent rates, we use the sum over the cumulative hazards. To determine the next state, we toss a dice where each possible state holds the probability of the transition rate at the generated time divided by the sum of all transition rates at this time Parameter values The chosen parameter values are motivated by our ICU data analysed in chapter 4. The estimate for the acceleration parameter exp(γ) resulted to The transition hazards are estimated as constants without adjustment for further confounders which results to α 12 = 0.04, α 13 = 0.008, α 15 = 0.12 and c AVINF = α 24 /α 13 = Furthermore, we obtain c AVDIS = 0.34 by fitting a Cox model for discharge with AV as status variable adjusted for further covariates and infection status. We use different values for c INFAV (1, 2.5, 5) which correspond to a different extent of time-dependent confounding Illustration of characteristics of infected population Infected patients tend to stay longer on ICU, no matter if the infection has an impact on the time to discharge. This comes from the fact that a patient who stays longer, is longer at ris of infection and thus, the probability that he is infected during his overall stay is higher. This can be seen as infection induces selection of patients with longer stay. We illustrate this by comparing the distribution of T of the infected population and the uninfected population. Recall, that T represents the length of stay, had the patient not been infected, i.e. a quantity which does not vary according to the influence of infection. Figure 5.4 shows the respective Kaplan-Meier curves for a sample of

51 5.3. Simulation study patients with the parameters given in and c INFAV = 2.5. S^(t) time t Figure 5.4.: Simulation results: Kaplan-Meier curves for T of subpopulation of uninfected and infected patients, dashed line for infected patients Estimation of effects by Cox model We use the simulated data to explore the behaviour of the Cox model for the discharge hazard without λ DIS (t INF(t), AV(t)) = λ 0 (t) exp(β 1 INF(t) + β 2 AV(t)) and with interaction term: λ DIS (t INF(t), AV(t)) = λ 0 (t) exp(β 1 INF(t) + β 2 AV(t) + β 3 INF(t) AV(t)) 43

52 5. Multistate model conform with assumptions of SNFTM where λ DIS (t F t )dt = P (T [t; t + dt) T t, F t ) Here, exp(β 1 ) is taen as estimate for exp(γ). As the transition rates were chosen to be constant, the acceleration of the time to discharge due to the infection reduces to a multiplication of the discharge hazard by exp(γ). Thus, the proportional hazards assumption of the Cox model concerning the impact of the infection is satisfied for unventilated patients. However, the Cox model is not appropriate to infer on the causal parameter exp(γ) for two reasons. First, it models the direct effects of infection and AV on discharge and second, as conventional statistical models, the Cox model cannot cope with time-dependent confounders. In terms of our multistate model, this means the following. For uninfected patients, the Cox model correctly models the influence of AV and we would put β 2 = c AVDIS. Also, for ventilated patients which were infected only afterwards, the influence of AV is undelayed and correctly modelled. Then, β 2 should equal c AVDIS and, moreover, β 1 should coincide with exp(γ). But, if AV is first used after infection, the Cox model does not account for the delayed influence of AV. In addition, if c INFAV > 1 and thus, AV acts as time-dependent confounder, it does not model the impact of AV as a mixture of effects depending on whether AV is due to INF or not. Consequently, the Cox model is misspecified for infected patients where AV first started after infection. This leads to estimates of β 1 and β 2 which deviate from the causal parameter exp(γ) and c AVDIS, the impact of AV, respectively Estimation of effects by SNFTM We further address the SNFTM estimates for exp(γ) which are consistent and ˆγ is asymptotically normal as theoretically nown [21]. The SNFTM copes with the timedependent confounder AV, which can be explained as follows in terms of our multistate model. The estimating procedure uses T and the covariate history until the time to infection. Thus, it does not include data after the first transmission to either state 3 or 4. Therefore, the results are not affected by the modelling of the discharge probability from state 4 as long as it is such that the assumption of no unmeasured confounders is satisfied. 44

53 5.3. Simulation study Results The results of our simulation study are summarised in table 5.1. The Cox model without interaction term only performs well, if c INFAV = 1, i.e. if AV is never due to INF. Then, the estimates for exp(γ) are only slightly influenced by the delayed impact of AV. If c INFAV > 1, the discharge hazard for patients in state 4, where AV is due to INF, is in truth only affected by exp(γ), but the Cox model ascribes the effect exp(β 1 + β 2 ). As c AVDIS is smaller than 1, the Cox model on average estimates an effect exp(β 1 ) which is closer to 1 than the true exp(γ). The Cox model with interaction term, however, compensates the falsely ascribed impact of AV by the additional parameter exp(β 3 ) which is hence on average estimated larger than 1. In our simulation study, the mean of the estimates of exp(β 3 ) is ([0.686;1.579]), ([0.659;1.916]) and ([0.623;2.073]) for c INFAV equal to 1, 2.5 and 5, respectively. Including the interaction term is helpful, as the simple joint model and the Cox model have the same structure for a sufficiently long period. But, the false modelling assumptions lead to a wide confidence interval. Model c INFAV = 1 c INFAV = 2.5 c INFAV = 5 Cox model without interaction 0.754, [0.609;0.988] 0.795, [0.595;0.975] 0.884, [0.719;1.099] with interaction 0.733, [0.531;1.056] 0.757, [0.499;1.096] 0.728, [0.521;1.228] SNFTM 0.753, [0.564;1.105] 0.740, [0.587;0.966] 0.745, [0.602;0.995] Table 5.1.: Simulation study with exp(γ) = 0.76 for different c INFAV : mean of estimates for exp(γ) and 95% confidence interval The main problem with using a standard model, which only associatively adjusts for time-dependent confounders, is, that it shows an effect, even if the infection has no impact on the length of hospital stay, i.e. if exp(γ) = 1. Therefore, we additionally performed a simulation study with the above setting, c INFAV = 5, but exp(γ) = 1 and 400 runs. In the Cox analysis without interaction term, % of the estimates for exp(β 1 ) differed significantly (p < 0.05) from 1, whereas the SNFTM only led to 6.5% significant results. 45

54 5. Multistate model conform with assumptions of SNFTM 5.4. Comparison to simulation proposed by Robins We now compare the simulation algorithm given by our multistate model to the one proposed by Robins which is outlined in section Here, the counterfactual time T is modelled first. T is then influenced by INF and transformed to T, the survival time. Already at the time the infection occurs, the survival time becomes manifest. The interaction between AV and INF is modelled separately by a time-dependent process which is stopped according to T. The impact of AV on T is achieved indirectly by sampling AV dependent on T. In contrast, by our multistate model, the changes in covariates and exposure and the survival process are modelled simultaneously. The data arises as in a prospective trial, which means the covariate history develops progressively and the survival time only becomes manifest when the endpoint is reached Extension to model current AV status We illustrate a possible extension of the multistate model to include AV as current ventilation status such that the SNFTM still applies. Now, AV is 0 for unventilated patients, AV jumps to 1 at the time of switching on AV and returns to 0, if AV is switched off again. This means, the patient can return to the unventilated states from 2 to 1 and from 4 to 3, respectively. If the patient switches from 4 to 3, the change in covariates happens after infection. We proceed analogously to the definition of the transition from 3 to 4 and model a partially delayed inuence of switching off AV such that on average its inuence is comparable whether the patient is infected or not. Therefore, we subclassify state 3 into two hidden substates C and D which can be interpreted as follows: C: INF, no impact of AV D: INF, still impact of AV This means that both discharge hazards are influenced by INF and the discharge hazard from D is still affected by AV: α C5 (t; ) = α 15 (t γ, ) exp(γ) α D5 (t; ) = α 15 (t γ, ) exp(γ) c AVDIS 46

55 5.5. Extension to model current AV status Moving from 1 to 3, the patient can only reach state C. Figure 5.5.: Possible transitions between hidden substates subclassifying state 3 and state 4 The possible transitions between the hidden substates are shown in figure 5.5. When interpreting the transitions, we must distinguish between switching off AV due to INF and switching off AV due to other reasons. Moving from A to A and from B to C corresponds to switching off AV due to INF. We first concentrate on the case without time-dependent confounding where AV is never due to INF. Secondly, we expand the model to the hidden substates A and B. In the following, we only characterise the multistate model by transition hazards from both, observable and hidden substates. By determining the survival function analogously 47

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS ANALYTIC COMPARISON of Pearl and Rubin CAUSAL FRAMEWORKS Content Page Part I. General Considerations Chapter 1. What is the question? 16 Introduction 16 1. Randomization 17 1.1 An Example of Randomization

More information

The decision theoretic approach to causal inference OR Rethinking the paradigms of causal modelling

The decision theoretic approach to causal inference OR Rethinking the paradigms of causal modelling The decision theoretic approach to causal inference OR Rethinking the paradigms of causal modelling A.P.Dawid 1 and S.Geneletti 2 1 University of Cambridge, Statistical Laboratory 2 Imperial College Department

More information

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Causality II: How does causal inference fit into public health and what it is the role of statistics? Causality II: How does causal inference fit into public health and what it is the role of statistics? Statistics for Psychosocial Research II November 13, 2006 1 Outline Potential Outcomes / Counterfactual

More information

Rank preserving Structural Nested Distribution Model (RPSNDM) for Continuous

Rank preserving Structural Nested Distribution Model (RPSNDM) for Continuous Rank preserving Structural Nested Distribution Model (RPSNDM) for Continuous Y : X M Y a=0 = Y a a m = Y a cum (a) : Y a = Y a=0 + cum (a) an unknown parameter. = 0, Y a = Y a=0 = Y for all subjects Rank

More information

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE ORIGINAL ARTICLE Marginal Structural Models as a Tool for Standardization Tosiya Sato and Yutaka Matsuyama Abstract: In this article, we show the general relation between standardization methods and marginal

More information

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis. Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description

More information

Comparative effectiveness of dynamic treatment regimes

Comparative effectiveness of dynamic treatment regimes Comparative effectiveness of dynamic treatment regimes An application of the parametric g- formula Miguel Hernán Departments of Epidemiology and Biostatistics Harvard School of Public Health www.hsph.harvard.edu/causal

More information

A Decision Theoretic Approach to Causality

A Decision Theoretic Approach to Causality A Decision Theoretic Approach to Causality Vanessa Didelez School of Mathematics University of Bristol (based on joint work with Philip Dawid) Bordeaux, June 2011 Based on: Dawid & Didelez (2010). Identifying

More information

Causal Inference. Miguel A. Hernán, James M. Robins. May 19, 2017

Causal Inference. Miguel A. Hernán, James M. Robins. May 19, 2017 Causal Inference Miguel A. Hernán, James M. Robins May 19, 2017 ii Causal Inference Part III Causal inference from complex longitudinal data Chapter 19 TIME-VARYING TREATMENTS So far this book has dealt

More information

Effects of multiple interventions

Effects of multiple interventions Chapter 28 Effects of multiple interventions James Robins, Miguel Hernan and Uwe Siebert 1. Introduction The purpose of this chapter is (i) to describe some currently available analytical methods for using

More information

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 155 Estimation of Direct and Indirect Causal Effects in Longitudinal Studies Mark J. van

More information

Estimation of direct causal effects.

Estimation of direct causal effects. University of California, Berkeley From the SelectedWorks of Maya Petersen May, 2006 Estimation of direct causal effects. Maya L Petersen, University of California, Berkeley Sandra E Sinisi Mark J van

More information

Simulating from Marginal Structural Models with Time-Dependent Confounding

Simulating from Marginal Structural Models with Time-Dependent Confounding Research Article Received XXXX (www.interscience.wiley.com) DOI: 10.1002/sim.0000 Simulating from Marginal Structural Models with Time-Dependent Confounding W. G. Havercroft and V. Didelez We discuss why

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

A multi-state model for the prognosis of non-mild acute pancreatitis

A multi-state model for the prognosis of non-mild acute pancreatitis A multi-state model for the prognosis of non-mild acute pancreatitis Lore Zumeta Olaskoaga 1, Felix Zubia Olaskoaga 2, Guadalupe Gómez Melis 1 1 Universitat Politècnica de Catalunya 2 Intensive Care Unit,

More information

Graphical Representation of Causal Effects. November 10, 2016

Graphical Representation of Causal Effects. November 10, 2016 Graphical Representation of Causal Effects November 10, 2016 Lord s Paradox: Observed Data Units: Students; Covariates: Sex, September Weight; Potential Outcomes: June Weight under Treatment and Control;

More information

Sensitivity analysis and distributional assumptions

Sensitivity analysis and distributional assumptions Sensitivity analysis and distributional assumptions Tyler J. VanderWeele Department of Health Studies, University of Chicago 5841 South Maryland Avenue, MC 2007, Chicago, IL 60637, USA vanderweele@uchicago.edu

More information

Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35

More information

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models 26 March 2014 Overview Continuously observed data Three-state illness-death General robust estimator Interval

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014 Assess Assumptions and Sensitivity Analysis Fan Li March 26, 2014 Two Key Assumptions 1. Overlap: 0

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Estimating direct effects in cohort and case-control studies

Estimating direct effects in cohort and case-control studies Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research

More information

Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules

Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules University of California, Berkeley From the SelectedWorks of Maya Petersen March, 2007 Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules Mark J van der Laan, University

More information

Instrumental variables estimation in the Cox Proportional Hazard regression model

Instrumental variables estimation in the Cox Proportional Hazard regression model Instrumental variables estimation in the Cox Proportional Hazard regression model James O Malley, Ph.D. Department of Biomedical Data Science The Dartmouth Institute for Health Policy and Clinical Practice

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Gov 2002: 4. Observational Studies and Confounding

Gov 2002: 4. Observational Studies and Confounding Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last two weeks: randomized experiments. From here on: observational studies. What

More information

Bounding the Probability of Causation in Mediation Analysis

Bounding the Probability of Causation in Mediation Analysis arxiv:1411.2636v1 [math.st] 10 Nov 2014 Bounding the Probability of Causation in Mediation Analysis A. P. Dawid R. Murtas M. Musio February 16, 2018 Abstract Given empirical evidence for the dependence

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Do patients die from or with infection?

Do patients die from or with infection? FACULTY OF SCIENCES Do patients die from or with infection? Finding the answer through causal analysis of longitudinal intensive care unit data. Maarten Bekaert Hieronder volgt een overzicht van de informatie,

More information

6.3 How the Associational Criterion Fails

6.3 How the Associational Criterion Fails 6.3. HOW THE ASSOCIATIONAL CRITERION FAILS 271 is randomized. We recall that this probability can be calculated from a causal model M either directly, by simulating the intervention do( = x), or (if P

More information

arxiv: v2 [math.st] 4 Mar 2013

arxiv: v2 [math.st] 4 Mar 2013 Running head:: LONGITUDINAL MEDIATION ANALYSIS 1 arxiv:1205.0241v2 [math.st] 4 Mar 2013 Counterfactual Graphical Models for Longitudinal Mediation Analysis with Unobserved Confounding Ilya Shpitser School

More information

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016 Mediation analyses Advanced Psychometrics Methods in Cognitive Aging Research Workshop June 6, 2016 1 / 40 1 2 3 4 5 2 / 40 Goals for today Motivate mediation analysis Survey rapidly developing field in

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

Casual Mediation Analysis

Casual Mediation Analysis Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction

More information

Causal inference in epidemiological practice

Causal inference in epidemiological practice Causal inference in epidemiological practice Willem van der Wal Biostatistics, Julius Center UMC Utrecht June 5, 2 Overview Introduction to causal inference Marginal causal effects Estimating marginal

More information

Estimating Causal Effects of Organ Transplantation Treatment Regimes

Estimating Causal Effects of Organ Transplantation Treatment Regimes Estimating Causal Effects of Organ Transplantation Treatment Regimes David M. Vock, Jeffrey A. Verdoliva Boatman Division of Biostatistics University of Minnesota July 31, 2018 1 / 27 Hot off the Press

More information

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping : Decision Theory, Dynamic Programming and Optimal Stopping Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj InSPiRe Conference on Methodology

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Methodology Marginal versus conditional causal effects Kazem Mohammad 1, Seyed Saeed Hashemi-Nazari 2, Nasrin Mansournia 3, Mohammad Ali Mansournia 1* 1 Department

More information

Statistical Models for Causal Analysis

Statistical Models for Causal Analysis Statistical Models for Causal Analysis Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 Three Modes of Statistical Inference 1. Descriptive Inference: summarizing and exploring

More information

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Designs: Theory, Computation and Optimisation Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference

More information

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,

More information

PSC 504: Dynamic Causal Inference

PSC 504: Dynamic Causal Inference PSC 504: Dynamic Causal Inference Matthew Blackwell 4/8/203 e problem Let s go back to a problem that we faced earlier, which is how to estimate causal effects with treatments that vary over time. We could

More information

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures Ruth Keogh, Stijn Vansteelandt, Rhian Daniel Department of Medical Statistics London

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh)

WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh) WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, 2016 Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh) Our team! 2 Avi Feller (Berkeley) Jane Furey (Abt Associates)

More information

Probabilistic Causal Models

Probabilistic Causal Models Probabilistic Causal Models A Short Introduction Robin J. Evans www.stat.washington.edu/ rje42 ACMS Seminar, University of Washington 24th February 2011 1/26 Acknowledgements This work is joint with Thomas

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Mediation and Interaction Analysis

Mediation and Interaction Analysis Mediation and Interaction Analysis Andrea Bellavia abellavi@hsph.harvard.edu May 17, 2017 Andrea Bellavia Mediation and Interaction May 17, 2017 1 / 43 Epidemiology, public health, and clinical research

More information

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks Y. Xu, D. Scharfstein, P. Mueller, M. Daniels Johns Hopkins, Johns Hopkins, UT-Austin, UF JSM 2018, Vancouver 1 What are semi-competing

More information

Does Cox analysis of a randomized survival study yield a causal treatment effect?

Does Cox analysis of a randomized survival study yield a causal treatment effect? Published in final edited form as: Lifetime Data Analysis (2015), 21(4): 579 593 DOI: 10.1007/s10985-015-9335-y Does Cox analysis of a randomized survival study yield a causal treatment effect? Odd O.

More information

Propensity scores for repeated treatments: A tutorial for the iptw function in the twang package

Propensity scores for repeated treatments: A tutorial for the iptw function in the twang package Propensity scores for repeated treatments: A tutorial for the iptw function in the twang package Lane Burgette, Beth Ann Griffin and Dan McCaffrey RAND Corporation July, 07 Introduction While standard

More information

3003 Cure. F. P. Treasure

3003 Cure. F. P. Treasure 3003 Cure F. P. reasure November 8, 2000 Peter reasure / November 8, 2000/ Cure / 3003 1 Cure A Simple Cure Model he Concept of Cure A cure model is a survival model where a fraction of the population

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Gov 2002: 13. Dynamic Causal Inference

Gov 2002: 13. Dynamic Causal Inference Gov 2002: 13. Dynamic Causal Inference Matthew Blackwell December 19, 2015 1 / 33 1. Time-varying treatments 2. Marginal structural models 2 / 33 1/ Time-varying treatments 3 / 33 Time-varying treatments

More information

Mendelian randomization as an instrumental variable approach to causal inference

Mendelian randomization as an instrumental variable approach to causal inference Statistical Methods in Medical Research 2007; 16: 309 330 Mendelian randomization as an instrumental variable approach to causal inference Vanessa Didelez Departments of Statistical Science, University

More information

Estimating the treatment effect on the treated under time-dependent confounding in an application to the Swiss HIV Cohort Study

Estimating the treatment effect on the treated under time-dependent confounding in an application to the Swiss HIV Cohort Study Appl. Statist. (2018) 67, Part 1, pp. 103 125 Estimating the treatment effect on the treated under time-dependent confounding in an application to the Swiss HIV Cohort Study Jon Michael Gran, Oslo University

More information

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University Causal Modeling in Environmental Epidemiology Joel Schwartz Harvard University When I was Young What do I mean by Causal Modeling? What would have happened if the population had been exposed to a instead

More information

Adaptive Designs: Why, How and When?

Adaptive Designs: Why, How and When? Adaptive Designs: Why, How and When? Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj ISBS Conference Shanghai, July 2008 1 Adaptive designs:

More information

The identification of synergism in the sufficient-component cause framework

The identification of synergism in the sufficient-component cause framework * Title Page Original Article The identification of synergism in the sufficient-component cause framework Tyler J. VanderWeele Department of Health Studies, University of Chicago James M. Robins Departments

More information

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Guanglei Hong University of Chicago, 5736 S. Woodlawn Ave., Chicago, IL 60637 Abstract Decomposing a total causal

More information

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions

More information

1 The problem of survival analysis

1 The problem of survival analysis 1 The problem of survival analysis Survival analysis concerns analyzing the time to the occurrence of an event. For instance, we have a dataset in which the times are 1, 5, 9, 20, and 22. Perhaps those

More information

Econometric Causality

Econometric Causality Econometric (2008) International Statistical Review, 76(1):1-27 James J. Heckman Spencer/INET Conference University of Chicago Econometric The econometric approach to causality develops explicit models

More information

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS Donald B. Rubin Harvard University 1 Oxford Street, 7th Floor Cambridge, MA 02138 USA Tel: 617-495-5496; Fax: 617-496-8057 email: rubin@stat.harvard.edu

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 259 Targeted Maximum Likelihood Based Causal Inference Mark J. van der Laan University of

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian Estimation of Optimal Treatment Regimes Via Machine Learning Marie Davidian Department of Statistics North Carolina State University Triangle Machine Learning Day April 3, 2018 1/28 Optimal DTRs Via ML

More information

CompSci Understanding Data: Theory and Applications

CompSci Understanding Data: Theory and Applications CompSci 590.6 Understanding Data: Theory and Applications Lecture 17 Causality in Statistics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu Fall 2015 1 Today s Reading Rubin Journal of the American

More information

Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of Unmeasured Confounding

Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of Unmeasured Confounding University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-13-2010 Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of

More information

Marginal Structural Models and Causal Inference in Epidemiology

Marginal Structural Models and Causal Inference in Epidemiology Marginal Structural Models and Causal Inference in Epidemiology James M. Robins, 1,2 Miguel Ángel Hernán, 1 and Babette Brumback 2 In observational studies with exposures or treatments that vary over time,

More information

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea)

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea) CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea) OUTLINE Inference: Statistical vs. Causal distinctions and mental barriers Formal semantics

More information

The influence of categorising survival time on parameter estimates in a Cox model

The influence of categorising survival time on parameter estimates in a Cox model The influence of categorising survival time on parameter estimates in a Cox model Anika Buchholz 1,2, Willi Sauerbrei 2, Patrick Royston 3 1 Freiburger Zentrum für Datenanalyse und Modellbildung, Albert-Ludwigs-Universität

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU

More information

Multistate models in survival and event history analysis

Multistate models in survival and event history analysis Multistate models in survival and event history analysis Dorota M. Dabrowska UCLA November 8, 2011 Research supported by the grant R01 AI067943 from NIAID. The content is solely the responsibility of the

More information

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming and Optimal Stopping Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj

More information

Guideline on adjustment for baseline covariates in clinical trials

Guideline on adjustment for baseline covariates in clinical trials 26 February 2015 EMA/CHMP/295050/2013 Committee for Medicinal Products for Human Use (CHMP) Guideline on adjustment for baseline covariates in clinical trials Draft Agreed by Biostatistics Working Party

More information

Effect Modification and Interaction

Effect Modification and Interaction By Sander Greenland Keywords: antagonism, causal coaction, effect-measure modification, effect modification, heterogeneity of effect, interaction, synergism Abstract: This article discusses definitions

More information

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

More information

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Analysis of competing risks data and simulation of data following predened subdistribution hazards Analysis of competing risks data and simulation of data following predened subdistribution hazards Bernhard Haller Institut für Medizinische Statistik und Epidemiologie Technische Universität München 27.05.2013

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

This paper revisits certain issues concerning differences

This paper revisits certain issues concerning differences ORIGINAL ARTICLE On the Distinction Between Interaction and Effect Modification Tyler J. VanderWeele Abstract: This paper contrasts the concepts of interaction and effect modification using a series of

More information

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen

More information

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback University of South Carolina Scholar Commons Theses and Dissertations 2017 Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback Yanan Zhang University of South Carolina Follow

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

The Effects of Interventions

The Effects of Interventions 3 The Effects of Interventions 3.1 Interventions The ultimate aim of many statistical studies is to predict the effects of interventions. When we collect data on factors associated with wildfires in the

More information

Causal Inference. Prediction and causation are very different. Typical questions are:

Causal Inference. Prediction and causation are very different. Typical questions are: Causal Inference Prediction and causation are very different. Typical questions are: Prediction: Predict Y after observing X = x Causation: Predict Y after setting X = x. Causation involves predicting

More information

Unbiased estimation of exposure odds ratios in complete records logistic regression

Unbiased estimation of exposure odds ratios in complete records logistic regression Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology

More information

Help! Statistics! Mediation Analysis

Help! Statistics! Mediation Analysis Help! Statistics! Lunch time lectures Help! Statistics! Mediation Analysis What? Frequently used statistical methods and questions in a manageable timeframe for all researchers at the UMCG. No knowledge

More information