MINIMAL SUFFICIENT CAUSATION AND DIRECTED ACYCLIC GRAPHS 1. By Tyler J. VanderWeele and James M. Robins. University of Chicago and Harvard University

Size: px

Start display at page:

Download "MINIMAL SUFFICIENT CAUSATION AND DIRECTED ACYCLIC GRAPHS 1. By Tyler J. VanderWeele and James M. Robins. University of Chicago and Harvard University"

Scott Palmer
6 years ago
Views:

1 MINIMAL SUFFICIENT CAUSATION AND DIRECTED ACYCLIC GRAPHS 1 By Tyler J. VanderWeele and James M. Robins University of Chicago and Harvard University Summary. Notions of minimal su cient causation are incorporated within the directed acyclic graph causal framework. Doing so allows for the graphical representation of su cient causes and minimal su cient causes on causal directed acyclic graphs whilst maintaining all of the properties of causal directed acyclic graphs. This in turn provides a clear theoretical link between two major conceptualizations of causality: one counterfactual-based and the other based on a more mechanistic understanding of causation. The d-separation criteria can be used to detect conditional independencies within particular strata of the conditioning variable which are not evident without the minimal su cient causation structures. These minimal su cient cause representations are further used to derive conditions that imply the existence of monotonic e ects and to derive rules governing minimal su cient causation and the signs of the conditional covariances amongst variables. 1. Introduction. Two broad conceptualizations of causality can be discerned in the literature, both within philosophy and within statistics and epidemiology. The rst conceptualization may be characterized as giving an account of the e ects 1 Abbreviated Title: Minimal Su cient Causation Tyler VanderWeele was supported by a predoctoral fellowship from the Howard Hughes Medical Institute. AMS 2000 subject classi cations. Primary 62A01, 62M45; secondary 62G99, 68T30, 68R10, 05C20. Key words and phrases. Causal inference; conditional independence; directed acyclic graphs; graphical models; interactions; su cient causation; synergism. 1

2 of certain causes; the approach addresses the question, "Given a particular cause or intervention, what are its e ects?" In the contemporary philosophical literature this approach is most closely associated with Lewis work [18, 19] on counterfactuals. In the contemporary statistics literature, this rst approach is closely associated with the work of Rubin [33, 34] on potential outcomes; of Robins [27, 28] on the use of counterfactual variables in the context of time-varying treatment; and of Pearl [23] on the graphical representation of various counterfactual relations on directed acyclic graphs. This counterfactual approach has been used extensively in statistics both in the development of theory and in application. The second conceptualization of causality may be characterized as giving an account of the causes of particular e ects; this approach attempts to address the question, "Given a particular e ect, what are the various events which might have been its cause?" In the contemporary philosophical literature this second approach is most notably associated with Mackie s work [20] on insu cient but necessary components of unnecessary but suf- cient conditions (INUS conditions) for an e ect. In the epidemiologic literature this approach is most closely associated with Rothman s work [32] on su cientcomponent causes. The work is more closely related to the various mechanisms for a particular e ect than is the counterfactual approach. However, with perhaps only one notable major exception in the statistics literature [1, comments relating Aickin s work to the present work are available from the authors upon request], Rothman s work on su cient-component causes has not been developed, extended or applied, though the basic framework is routinely taught in introductory epidemiology courses. In this paper we incorporate notions of minimal su cient causes, corresponding to Rothman s su cient-component causes, within the directed acyclic graph causal 2

3 framework [23]. Doing so essentially unites the mechanistic and the counterfactual approaches into a single framework. By incorporating minimal su cient causation into the directed acyclic graph framework it is possible to graphically represent suf- cient causes and minimal su cient causes on a causal directed acyclic graph whilst maintaining all of the properties of a causal directed acyclic graph. Various extensions to the directed acyclic graph causal framework follow concerning conditional independence, monotonic e ects and conditional covariance. Incorporating minimal su cient causes into the directed acyclic graph causal framework essentially gives rise to the graphical representation of AND and OR nodes on the directed acyclic graph corresponding to what will be de ned below as individual su cient conjunctions and determinative sets of su cient conjunctions. These AND and OR nodes could potentially be incorporated into more general graphical models such summary graphs [3], MC-graphs [12], chain graph models[16, 5, 37, 17, 2, 15, 25, 41] and ancestral graph models [26]. However, directed acyclic graphs have proven particularly useful in representing causal relationships since the directed and acyclic nature of these graphs assures that causes precede e ects and that a variable cannot be its own cause. In fact, with very few exceptions [15], the use of graphical models in the eld of causal inference has been restricted to these directed acyclic graphs which includes graphs allowing for bidirected edges which represent unobserved common causes. The directed acyclic graph framework, as formulated by Pearl [23] in terms of non-parametric structural equations, will therefore be the focus in this paper of the graphical representation of minimal su cient causation. Note that Pearl s non-parametric structural equation theory is deterministic, rather than stochastic, at the individual level. It follows then that our theory, as a re nement of Pearl s, will also be deterministic at the 3

4 individual level. In what follows we will provide rigorous de nitions for the concepts of a su cient cause and minimal su cient causes within the directed acyclic graph framework. It will be seen below that corresponding to these mathematical de nitions are informal philosophical notions such as those of a causal mechanism and of synergism. It is the philosophical ideas that provide some of the motivation for the development of the mathematical and statistical theory presented in this paper and, as such, these philosophical issues receive some attention in certain examples and also in the discussion of the various results presented. However the reader uninterested in the philosophy can ignore this material as none of the de nitions, propositions, lemmas, theorems or corollaries make reference to these more informal philosophical notions. The theory developed in this paper is motivated by several other considerations. It is now standard practice to use graphs to represent and characterize conditional independence relationships amongst variables [13]. Various criteria have been developed to identify these conditional independence relations. The incorporation of minimal su cient cause nodes allows for these criteria to be applied in order to detect certain conditional independencies within particular strata of the conditioning variable which were not evident without the minimal su cient causation structures. These "asymmetric conditional independencies" have been represented elsewhere using Bayesian multinets [6]. Another motivation for the development of the theory in this paper concerns the notion of interaction. Product terms are frequently included in regression models to assess interactions amongst variables; these statistical interactions, however, even if present, need not imply the existence of an actual mechanism in which two distinct causes both participate. Interactions which do concern the actual mechanisms are sometimes referred to as "synergism" [32], 4

5 "biologic interactions" [35] or "conjunctive causes" [21] and the development of minimal su cient cause theory provides a useful framework to characterize mechanistic interactions. Incorporating minimal su cient cause nodes into the directed acyclic framework also allows in certain cases for the determination of the sign of the conditional covariance of various nodes on the graph. As yet further motivation, we conclude this introduction by describing how the methods we develop in this paper clari ed and helped resolve an analytic puzzle faced by psychiatric epidemiologists. Consider the following somewhat simpli ed version of a study reported in Hudson et al. [10]. Three hundred pairs of obese siblings living in an ethnically homogenous upper middle class suburb of Boston are recruited and cross classi ed by the presence or absence of two psychiatric disorders: manic-depressive disorder P and binge eating disorder B. The question of scienti c interest is whether these two disorders have a common genetic cause, because, if so, studies to search for a gene or genes that cause both disorders would be useful. Consider two analyses. The rst analysis estimates the covariance between P 2i and B 1i, while the second analysis estimates the conditional covariance between P 2i and B 1i among subjects with P 1i = 1, where B ki is 1 if the k th sibling in the i th family has disorder B and is zero otherwise, with P ki de ned analogously. It was found that the estimates and were both positive with 95% con dence intervals that excluded zero. Hudson et al. s substantive prior knowledge is summarized in the directed acyclic graph of Figure 1 in which the i index denoting family is suppressed. In what follows we will make reference to some standard results concerning directed acyclic graphs; these results are reviewed in detail in the following section. 5

6 G B B 1 E 1 F P 1 B 2 G P P 2 E 2 Figure 1. Causal directed acyclic graph under the alternative hypothesis of familial coaggregation. In Figure 1, G B and G P represent the genetic causes of B and P respectively that are not common causes of both B and P: The variables E 1 and E 2 represent the environmental exposures of siblings 1 and 2 respectively that are common causes of both diseases, perhaps such as exposure to a particularly stressful teacher. The variables G B and G P are assumed independent as would typically be the case if, as is highly likely, they are not genetically linked. Furthermore, as is common in genetic epidemiology, the environmental exposures E 1 and E 2 are assumed independent of the genetic factors. The causal arrows from P 1 to B 1 and P 2 to B 2 represent the investigators beliefs that manic-depressive disorder may be a cause of binge eating disorder but not vice-versa. The node F represents the common genetic causes of both P and B as well as any environmental causes of both P and B that are correlated within families. There is no data available for G B, G P, E 1, E 2 or F. The reason for grouping the common genetic causes with the correlated environmental causes is that based on the available data fp ki ; B ki ; i = 1; :::; 300; k = 1; 2g, we can only hope to test the null hypothesis that F so de ned is absent, which is referred to as the hypothesis of no familial coaggregation. If this null hypothesis is rejected, we cannot determine from the available data whether F is present due to a common genetic cause or a correlated common environmental cause. Thus E 1 and E 2 are 6

7 independent on the graph because, by de nition, they represent the environmental common causes of B and P that are independently distributed between siblings. Now, under the null hypothesis that F is absent, we note that P 2 and B 1 are still correlated due to the unblocked path P 2 G p P 1 B 1 so we would expect 6= 0 as found. Furthermore P 2 and B 1 are still expected to be correlated given P 1 = 1 due to the unblocked path P 2 G p P 1 E 1 B 1 so we would expect 6= 0 as found. Thus we cannot test the null hypothesis that F is absent without further substantive assumptions beyond those encoded in the causal directed acyclic graph of Figure 1. Now Hudson et al. were also willing to assume that for no subset of the population did the genetic causes G p and G B of P and B prevent disease. Similarly they assumed there was no subset of the population for whom the environmental causes E 1 and E 2 of B and P prevented either disease. We will show in Section 5 that under these additional assumptions, the null hypothesis that F is absent implies that the conditional covariance must be less than or equal to zero, provided that there is no interaction, in the su cient cause sense, between E and G P : Hudson et al. thought it plausible that no su cient cause interaction between E and G P existed and thus rejected the null hypothesis that that F is absent because the estimate of was positive with a 95% con dence interval that did not include zero. Thus the conclusion of Hudson et al. that familial aggregation of diseases B and P was present depended critically on the existence of (i) a formal de nition of a su cient cause interaction, (ii) a substantive understanding of what the assumption of no su cient cause interaction entailed, and (iii) a sound mathematical theory that related assumptions about the absence of su cient cause interactions to testable restrictions on the distribution of the observed data. In this paper we provide a 7

8 theory that o ers (i)-(iii). The remainder of the paper is organized as follows. The second section reviews the directed acyclic graph causal framework and provides some basic de nitions; the third section presents the theory which allows for the graphical representation of minimal su cient causes within the directed acyclic graph causal framework; the fourth section describes certain equivalences between minimal su cient causation and the notion of a monotonic e ect; the fth section considers the relation between minimal su cient causation and the sign of conditional covariances; the sixth section provides some discussion concerning possible extensions to the present work. 2. Basic De nitions and Concepts. In this section we review the directed acyclic graph causal framework and give a number of de nitions regarding su cient conjunctions and related concepts. Following Pearl [23], a causal directed acyclic graph is a set of nodes (X 1 ; :::; X n ) corresponding to variables and directed edges amongst nodes such that the graph has no cycles and such that for each node X i on the graph the corresponding variable is given by its non-parametric structural equation X i = f i (pa i ; i ) where pa i are the parents of X i on the graph and the i are mutually independent. These non-parametric structural equations can be seen as a generalization of the path analysis and linear structural equation models [23, 24] developed by Wright [42] in the genetics literature and Haavelmo [9] in the econometrics literature. Robins [29, 30] discusses the close relationship between these non-parametric structural equation models and fully randomized causally interpreted structured tree graphs [27, 28]. Spirtes et al. [36] present a causal interpretation of directed acyclic graphs outside the context of non-parametric structural equations and counterfactual variables. The non-parametric structural equations encode counterfactual relationships amongst the variables represented on the graph. 8

9 The equations themselves represent one-step ahead counterfactuals with other counterfactuals given by recursive substitution. A node E will be a parent of D if there is some level of all variables that precede D such that intervening to set E to di erent levels will allow D to vary even after intervening to x all other variables that precede D. If there exists some level of A such that intervening to set C to di erent levels will allow B to vary even after xing A and there exists some level of B such that intervening to set C to di erent levels will allow A to vary even after xing B then C is said to be a common cause of A and B. The requirement that the i be mutually independent is essentially a requirement that there is no variable absent from the graph which, if included on the graph, would be a parent of two or more variables [23, 24]. A path is a sequence of nodes connected by edges regardless of arrowhead direction; a directed path is a path which follows the edges in the direction indicated by the graph s arrows; a collider is a particular node on a path such that both the preceding and subsequent nodes on the path have directed edges going into that node i.e. both the edge to and the edge from that node have arrowheads into the node. A path between A and B is said to be blocked given some set of variables Z if either there is a variable in Z on the path that is not a collider or if there is a collider on the path such that neither the collider itself nor any of its descendants are in Z. If all paths between A and B are blocked given Z then A and B are said to be d-separated given Z. It has been shown that if all paths between A and B are blocked given Z then A and B are conditionally independent given Z [40, 7, 14]. The directed acyclic graph causal framework has proven to be particularly useful in determining whether conditioning on a given set of variables, or none at all, is su cient to control for confounding. Let D E=e denote the counterfactual value of D intervening to set 9

10 E = e. Pearl [23] showed that for intervention variable E and outcome D, if a set of variables Z such that no variable in Z is a descendent of E blocks all "back-door paths" from E to D (i.e. all paths with directed edges into E) then conditioning on Z su ces to control for confounding for the estimation of the causal e ect of E on D and this causal e ect is then given by E(D E=e ) = P z E(DjE = e; Z = z)pr(z = z). Note that this is a graphical generalization of Theorem 4 of Rosenbaum and Rubin [31] and of the g-formula [27, 28, 36, 22]. In giving de nitions for a su cient conjunctions and related concepts, we will use the following notation. An event is a binary variable taking values in f0; 1g. The complement of some event E we will denote by E. A conjunction or product of the events X 1 ; :::; X n will be written as X 1 :::X n. The associative OR operator, W, is de ned by A W B = A + B AB. For a random variable A with sample space we will use the notation A 0 to denote that A(!) = 0 for all! 2. We will use the notation 1 A=a to denote the indicator function for the random variable A taking the value a; for some subset S of the sample space we will use 1 S to denote the indicator that! 2 S. We will use the notation A a BjC to denote that A is conditionally independent of B given C. We begin with the de nitions of a su cient conjunction and a minimal su cient conjunction. These basic de nitions make no reference to directed acyclic graphs or causation. Definition 1. A set of events X 1 ; :::; X n is said to constitute a su cient conjunction for event D if X 1 :::X n = 1 ) D = 1. Definition 2. A set of events X 1 ; :::; X n is said to constitute a minimal su cient conjunction for an event D if X 1 :::X n = 1 ) D = 1 and there is no proper subset X i1 ; :::; X ik of X 1 ; :::; X n such that X i1 :::X ik = 1 ) D = 1. 10

11 Su cient conjunctions for a particular event need not be causes for an event. Suppose a particular sound is produced when and only when an individual blows a whistle. This particular sound the whistle makes is a su cient conjunction for the whistle s having been blown but the sound does not cause the blowing of the whistle. The converse rather is true - the blowing of the whistle causes the sound to be produced. Corresponding then to these notions of a su cient conjunction and a minimal su cient conjunction are those of a su cient cause and a minimal su cient cause which will be de ned in Section 3. Definition 3. A set of events M 1 ; :::; M n, each of which may be some product of events, is said to be determinative for some event D if D = M 1 W M2 W ::: W Mn. Definition 4. If M 1 ; :::; M n is a determinative set of (minimal) su cient conjunctions for D such that there is no proper subset M i1 ; :::; M ik of M 1 ; :::; M n that is also a determinative set of (minimal) su cient conjunctions for D then M 1 ; :::; M n is said to constitute a non-redundant determinative set of (minimal) su cient conjunctions for D. Example 1. Suppose A = B W CE and D = EF. If we consider all the minimal su cient conjunctions for A among the events fb; C; Dg we can see that B and CD are the only minimal su cient conjunctions but it is not the case that A = B W CD. Clearly then a complete list of minimal su cient conjunctions for A generated by a particular collection of events may not be a determinative set of su cient conjunctions for A. If we consider all minimal su cient conjunctions for A among the events fb; C; D; Eg we see that B and CD and CE are all minimal su cient conjunctions. In this example, B W CD W CE is a determinative set of minimal su cient conjunctions for A but is not non-redundant. We see then 11

12 that even when a complete list of minimal su cient conjunctions generated by a particular collection of events constitutes a determinative set of minimal su cient conjunctions it may not be a non-redundant determinative set of minimal su cient conjunctions. 3. Minimal Su cient Causation and Directed Acyclic Graphs. Causal directed acyclic graphs provide a useful framework in which to make use of these ideas of su cient conjunctions and minimal su cient conjunctions. With our basic de nitions in place we can develop theory concerning minimal su cient causation by stating and proving a number of results relating su cient conjunctions to directed acyclic graphs. Theorem 1. Consider a causal directed acyclic graph G with some node D such that D and all its parents are binary. Suppose that there exists a set of binary variables A 0 ; :::; A u such that a determinative set of su cient conjunctions for D, say M 1 ; :::; M S, can be formed from conjunctions of A 0 ; :::; A u along with the parents of D on G and the complements of these variables. Suppose further that there exists a causal directed acyclic graph H such that the parents of D on H that are not on G consist of the nodes A 0 ; :::; A u and such that G is the marginalization of H over the set of variables which are on the graph for H but not G. Then the directed acyclic graph J formed by adding to H the nodes M 1 ; :::; M S, removing the directed edges into D from the parents of D on H, adding directed edges from each M i into D and adding directed edges into each M i from every parent of D on H which appears in the conjunction for M i is itself a causal directed acyclic graph. Proof. To prove that the directed acyclic graph J is a causal directed acyclic graph it is necessary to show that each of the nodes on the directed acyclic graph can 12

13 be represented by a non-parametric structural equation involving only the parents on J of that node and a random term i which is independent of all other random terms j in the non-parametric structural equations for the other variables on the graph. The non-parametric structural equation for M i may be de ned as the product of events in the conjunction for M i. The non-parametric structural equation for D can be given by D = M 1 W ::: W Mn : The non-parametric structural equations for all other nodes on J can be taken to be the same as those de ning the causal directed acyclic graph H. Because the non-parametric structural equations for D and for each M i on J are deterministic, they have no random error term. Thus, for the non-parametric structural equations de ning D and each M i on J, the requirement that the non-parametric structural equation s random term i is independent of all the other random terms j in the non-parametric structural equations for the other variables on the graph is trivially satis ed. That this requirement is satis ed for the non-parametric structural equations for the other variables on J follows from the fact that it is satis ed on H. In Theorem 1 su cient conjunctions for D are constructed from some set of variables that, on some causal directed acyclic graph H, are all parents of D and thus, within the directed acyclic graph causal framework, it makes sense to speak of su cient causes and minimal su cient causes. Definition 5. If on a causal directed acyclic graph some node D with nonparametric structural equation D = f D (pa D ; D ) is such that D and all its parents are binary then X 1 ; :::; X n is said to constitute a su cient cause for D if X 1 ; :::; X n are all parents of D or complements of the parents of D and are such 13

14 that f D (pa D ; D ) = 1 for all D whenever pa D is such that X 1 :::X n = 1; if no proper subset of X 1 ; :::; X n also constitutes a su cient cause for D then X 1 ; :::; X n is said to constitute a su cient cause for D. A set of (minimal) su cient causes, M 1 ; :::; M n, each of which is a product of the parents of D and their complements, is said to be determinative for some event D if for all D, f D (pa D ; D ) = 1 if and only if pa D is W W W such that M 1 M2 ::: Mn = 1; if no proper subset of M 1 ; :::; M n is also determinative for D then M 1 ; :::; M n is said to constitute a non-redundant determinative set of (minimal) su cient causes for D. Definition 6. If for some directed acyclic graph G there exist A 0 ; :::; A u which satisfy the conditions of Theorem 1 for some node D on G so that a determinative set of su cient causes for D can be constructed from A 0 ; :::; A u along with the parents of D on G and their complements then D is said to admit a su cient causation structure. In the examples below we will use the following notation. First, we will in general replace the M i nodes with the conjunctions that constitute them. Second, the directed edges from the A i nodes and the parents of D into the M i nodes and from the M i nodes into D represent deterministic dependencies. The node D with directed edges from the M i nodes is e ectively an OR node. The M i nodes with the directed edges from the A i nodes and the parents of D on G are e ectively AND nodes. To indicate these deterministic dependencies we add to the diagram an ellipse around the M i nodes. We call this resulting diagram a causal directed acyclic graph with a su cient causation structure (or a minimal su cient causation structure if the determinative set of su cient conjunctions for D are each minimal su cient conjunctions). If a determinative set of su cient causes for D can be constructed simply from 14

15 the parents of D on G then H can be taken to be G. If a set of variables A 0 ; :::; A u satisfying Theorem 1 can be constructed from functions of the random term U = G D of the non-parametric structural equation for D on G and their complements so that A i = f i (U) then H can be chosen to be the graph G with the additional nodes U; A 0 ; :::; A u and with directed edges from U into each A i and from each A i into D. Our rst several examples will be cases in which no additional nodes A 0 ; :::; A u are needed to form a determinative set of su cient causes for D but rather in which a determinative set of su cient causes can be formed just from the parents of D on the original causal directed acyclic graph G. Example 2. Consider a causal directed acyclic graph given in Figure 2(i) and suppose E 1 E 2 and E 3 E 4 constituted a determinative set of su cient causes for D. Then by Theorem 1, the graph in Figure 2(ii) is also a causal directed acyclic graph. Similarly if Figure 2(iii) represents a causal directed acyclic graph and if E 1 E 2 and E 2 E 3 constitutes a determinative set of su cient causes for D then by Theorem 1, the graph in Figure 2(iv) is also a causal directed acyclic graph. Note that in Figure 2(iv) there are directed edges from E 2 into those su cient cause nodes involving E 2 and into those involving E 2. E 1 E 1 E 1 E 1 E 2 E 1 E 2 D E 2 D E 2 D E 2 E 1 E 2 D E 3 E 2 E 3 E 3 E 3 E 4 E 3 E 4 E 3 E 4 (i) (ii) (iii) (iv) Fig. 2. Causal directed acyclic graphs with su cient causation structures. 15

16 Theorem 1 provides a link between all four of the causal model frameworks discussed by Greenland and Brumback [8]: graphical models, potential outcome (counterfactual) models, su cient-component cause models and structural equation models. The four are linked through non-parametric structural equations. Graphical models as developed by Pearl [23] are diagrammatic shorthand for nonparametric structural equations. Non-parametric structural equations can be interpreted as sets of counterfactual relations. Theorem 1 provides the nal link by relating su cient-component cause models to non-parametric structural equations and thereby also graphical models. Non-parametric structural equations may thus be seen as a framework encompassing all four of these approaches to representing causal relations. Because a causal directed acyclic graph with a su cient causation structure is itself a causal directed acyclic graph, the d-separation criterion applies and allow one to determine independencies and conditional independencies. A minimal su cient causation structure will often make apparent conditional independencies within strata which were not apparent on the original causal directed acyclic graph. Two corollaries to Theorem 1 are particularly useful in this regard. Corollary 1. If some node D on a causal directed acyclic graph admits a su cient causation structure then conditioning on D = 0 conditions also on all su cient cause nodes for D on the causal directed acyclic graph with the su cient causation structure. Corollary 2. Suppose that all the parents of some node D on a causal directed acyclic graph are binary and independent and that D admits a su cient causation structure, then the parents of D on the causal directed acyclic graph with the su cient causation structure can be broken into equivalence classes where two 16

17 elements share an equivalence class if on the causal directed acyclic graph with the su cient causation structure there exists a path between them involving only the set of parents of D and the su cient cause nodes. Any two causes not in the same equivalence class are conditionally independent given D = 0. Example 2 (continued). Consider the causal directed acyclic graph with the minimal su cient causation structure given in Figure 2(ii). Conditioning on D = 0 also conditions on E 1 E 2 = 0 and E 3 E 4 = 0 and thus by the d-separation criteria E i is conditionally independent of E j given D = 0 for i 2 f1; 2g; j 2 f3; 4g. In the causal directed acyclic graph with the minimal su cient causation structure in Figure 2(iv) no similar conditional independence relations within the D = 0 stratum holds. Although conditioning on D = 0 conditions also on E 1 E 2 = 0 and E 2 E 3 = 0 there still remains an unblocked path E 1 E 1 E 2 E 2 E 2 E 3 E 3 between E 1 and E 3 and so E 1 and E 3 are not conditionally independent given D = 0 and similarly there are unblocked paths between E 1 and E 2 and also between E 2 and E 3 given D = 0. The additional nodes A 0 ; :::; A u required to form a determinative set of su cient conjunctions for D will generally not be unique. For example, if D = A 0 W A1 E then it is also the case that D = B 0 W B1 E where B 0 = A 0 and B 1 = A 0 A 1. Similarly, there will in general be no unique set of su cient causes that is determinative for D. For example if E 1 and E 2 constitute a set of su cient causes for D so that D = E 0 W E1 then it is also the case that E 1 E 2, E 1 E 2, and E 1 E 2 also constitute a set of su cient causes for D and so we could also write D = E 1 E 2 W E1 E 2 W E1 E 2. It can be shown that not even non-redundant determinative sets of minimal su cient causes are unique. Corresponding to the de nition of a su cient cause is the more philosophical 17

18 notion of a causal mechanism. A causal mechanism can be conceived of as a set of events or conditions which, if all present, inevitably bring about the outcome under consideration in a particular manner. A causal mechanism thus provides a particular description of how the outcome comes about. Suppose for instance that an individual were exposed to two poisons, E 1 and E 2, such that in the absence of E 2, the poison E 1 would lead to heart failure resulting in death; and that in the absence of E 1, the poison E 2 would lead to respiratory failure resulting in death; but such that when E 1 and E 2 are both present, they interact and lead to a failure of the nervous system again resulting in death. In this case there are three distinct causal mechanisms for death each corresponding to a su cient cause for D: death by heart failure corresponding to E 1 E 2, death by respiratory failure corresponding to E 1 E 2, and death due to a failure of the nervous system corresponding to E 1 E 2. It is interesting to note that in this case none of the su cient causes corresponding to the causal mechanisms is minimally su cient. Each of E 1 E 2, E 1 E 2, and E 1 E 2 is su cient for D but none is minimally su cient as either E 1 or E 2 alone is su cient for death. The last example shows that the existence of a particular set of determinative su cient causes does not guarantee that there are actual causal mechanisms corresponding to these su cient causes; it only implies that a set of causal mechanisms corresponding to these su cient causes cannot be ruled out by a complete knowledge of counterfactual outcomes. In particular, in the previous example, the set fe 1 ; E 2 g is a determinative set of su cient causes that does not correspond to the actual set of causal mechanisms fe 1 E 2 ; E 1 E 2 ; E 1 E 2 g. If there are two or more sets of su cient causes that are determinative for some outcome D then although the two sets of determinative su cient causes are logically equivalent for prediction, we 18

19 nevertheless view them as distinct. In such cases, some knowledge of the subject matter in question will in general be needed to discern which of the sets of determinative su cient causes actually corresponds to the true causal mechanisms. For instance, in the previous example, we needed biological knowledge of how poisons brought about death in the various scenarios. We will, in the interpretation of our results, assume that there always exists some set of true causal mechanisms which forms a determinative set of su cient causes for the outcome. The concept of synergism is closely related to that of a causal mechanism and is often found in the epidemiologic literature [32, 35, 11]. We will say that there is synergism between the e ects of E 1 and E 2 on D if there exists a su cient cause for D which represents some causal mechanism and such that this su cient cause has E 1 and E 2 in its conjunction. In related work, we have developed tests for synergism i.e. tests for the joint presence of two or more causes in a single su cient cause [39]. As noted in the introduction, in some of our examples and in our discussion of the various results in the paper we will sometimes make reference to the concepts of a causal mechanism and synergism. However, all de nitions, propositions, lemmas, theorems and corollaries will be given in terms of su cient causes for which we have a precise de nition. The graphical representation of su cient causes on a causal directed acyclic graph does not require that the determinative set of su cient causes for D be minimally su cient, nor does it require that the set of determinative su cient causes for D be non-redundant. To expand a directed acyclic graph into another directed acyclic graph with su cient cause nodes, all that is required is that the set of su - cient causes constitutes a determinative set of su cient causes for D. However, a set of events that constitutes a su cient cause can be reduced to a set of events that 19

20 constitutes a minimal su cient cause by iteratively excluding unnecessary events from the set until a minimal su cient cause is obtained. Also a set of determinative su cient causes that is redundant can be reduced to one that is non-redundant by excluding those su cient causes or minimal su cient causes that are redundant. It is sometimes an advantage to reduce a redundant set of su cient causes to a nonredundant set of minimal su cient causes. This is so because allowing su cient causes that are not minimally su cient or allowing redundant su cient causes or redundant minimal su cient causes can obscure the conditional independence relations implied by the structure of the causal directed acyclic graph. This is made evident in Example 3. Example 3. Consider a causal directed acyclic graph with the minimal su - cient causation structure indicated in Figure 3(i). A AB A AB B C D B BC D C C BC (i) (ii) Fig. 3. Example illustrating that non-minimal su cient causes can obscure conditional independence relations. Conditioning on D = 0 conditions also on AB = 0 and C = 0 and by the d- separation criterion, A is conditionally independent of C given D = 0. However, consider a logically equivalent su cient causation structure for this causal directed acyclic graph which allows su cient causes that are not minimal su cient causes as given in Figure 3(ii). Here BC and BC are su cient causes but not minimal 20

21 su cient causes. Conditioning on D = 0 conditions also on AB = 0, BC = 0 and BC = 0 but the d-separation criterion no longer implies that A and C are conditionally independent given D = 0 because on the causal directed acyclic graph given in Figure 3(ii), there is an unblocked path between A and C conditioning on D = 0, namely A AB B BC C. Thus from the causal directed acyclic graph given in Figure 3(i) it was possible to use the d-separation criterion to identify the conditional independence of A and C given D = 0. But from the causal directed acyclic graph given in Figure 3(ii) the d-separation criterion would not identify this conditional independence relation even though the two directed acyclic graphs describe the same causal structure. Allowing su cient causes that are not minimal su cient causes obscures the conditional independence relation. Similar examples can be constructed to show that allowing redundant su cient causes or redundant minimal su cient causes can also obscure conditional independence relations. Although allowing su cient causes that are not minimally su cient or allowing redundant su cient causes or redundant minimal su cient causes can obscure the conditional independence relations implied by the structure of the causal directed acyclic graph, it may sometimes be desirable to include non-minimal su cient causes or redundant su cient causes. For example, as noted above, non-minimal su cient cause nodes or redundant su cient cause nodes may represent separate causal mechanisms upon which it might be possible to intervene. Further discussion of conditional independence relations in su cient causation structures with non-minimally su cient causes and redundant su cient causes is given in Section 6. Theorem 1 gives rise to several further de nitions presented below. Definition 7. If some node D on a causal directed acyclic graph G admits a su cient causation structure then the parents of D are said to be the main causes 21

22 of D. Definition 8. The conjunction of main causes and their complements in a particular su cient cause for D is said to be a principal cause for D. Definition 9. If some node D admits a su cient causation structure, the additional variables A 0 ; :::; A u needed to form a set of su cient causes for D are said to be the co-causes of D. If a co-cause appears in a su cient conjunction without any main cause in its conjunction then it is said to be a residual co-cause (and will generally be denoted by A 0 ). Otherwise the co-cause is said to be a non-residual co-cause. A couple additional comments merit attention. First, if the main causes of D are labeled E 1 ; :::; E m then each su cient cause M j must either include the main cause E i in its conjunction or include E i in its conjunction or include neither E i nor E i in its conjunction; clearly it cannot include both. There are thus 3 m possible combinations of the E i s and their complements that may appear as principal causes. Second, a su cient cause need only involve one co-cause A i in its conjunction because if it involved A i1 ; :::; A ik then A i1 ; :::; A ik could be replaced by the product A 0 i = A i 1 :::A ik. In certain cases though, it may be desirable to include more than one A i in a su cient cause if this corresponds to the actual causal mechanisms. However, if only one A i is included in each su cient cause, then there will need to be at most 3 m su cient causes each of which will involve in its conjunction only one co-cause A i and one of the 3 m possible combinations of the E i s and E i s that may be included. It was noted above that if a set of variables A 0 ; :::; A u satisfying Theorem 1 can be constructed from functions of the random term U = G D of the non-parametric 22

23 structural equation for D on G and their complements so that A i = f i (U) then H can be chosen to be the graph G with the additional nodes U; A 0 ; :::; A u and with directed edges from U into each A i and from each A i into D. This gives rise to the de nition, given below, of a representation for D. Definition 10. If D and all of its parents on the causal directed acyclic graph G are binary and there exists some set fa i ; P i g such that each P i is some conjunction of the parents of D and their complements and such that there exist functions f i for which A i = f i ( D ) where D is the random term in the non-parametric structural equation for D on G and such that D = W i A ip i then fa i ; P i g is said to constitute a representation for D. The non-parametric structural equation for D is given by D = f(pa D ; " D ). Suppose D has m parents on the original causal directed acyclic graph G. Since these parents are binary there are 2 m values which pa D can take. Since f maps (pa D ; " D ) to f0; 1g each value of " D assigns to every possible realization of pa D either 0 or 1 through f. There are 2 2m such assignments. Thus without loss of generality we may assume that " D takes on some nite number of distinct values N 2 2m and so we may write the sample space for " D as D = f! 1 ; :::! N g and we may use! =! i and " D = " D (! i ) interchangeably. If, in a representation for D, for some principal cause P j the corresponding cocause is such that A j 0 then we will suppress A j P j from the disjunction W i A ip i. In other words, we will use W i A ip i as shorthand for W i:a i 6=0 A ip i and we will refer to A i as a co-cause and to P i as a principal cause only if A i is not identically 0. Furthermore, if in a representation for D, W i A ip i, a co-cause A j itself constitutes a su cient cause, M j, without any main cause in its conjunction then the principal cause for this su cient cause will be suppressed; we will write M j = A j. As noted 23

24 above, we will typically denote this co-cause with the subscript 0 i.e. M j = A 0. If, on the other hand, a principal cause is such that the principal cause itself, without any co-cause, constitutes a su cient cause, M j, for D then the co-cause for this su cient cause will be suppressed; we will write M j = P j. The su cient causes in all of the examples we have considered thus far have consisted of principal causes without co-causes; in these examples it was possible to construct a determinative set of su cient causes for D from the parents of D alone; no co-causes were necessary. If the A i variables are constructed from functions of the random term D in the non-parametric structural equation for D on G then these A i variables may or may not allow for interpretation and they may or may not be such that an intervention on these A i variables is conceivable. In certain cases the A i variables may simply be logical constructs for which no intervention is conceivable. Although in certain cases it may not be possible to intervene on the A i variables, we will still refer to conjunctions of the form A i P i as su cient causes for D as it is assumed that it is possible to intervene on the parents of D which constitute the conjunction for P i. Suppose that for some node D on a causal directed acyclic graph G, a set of variables A 0 ; :::; A u satisfying Theorem 1 can be constructed from functions of the random term U = D in the non-parametric structural equation for D on G so that a representation for D is given by D = W i A ip i. Then, in order to simplify the diagram, instead of adding to G the variable U and directed edges from U into each A i so as to form the minimal su cient causation structure, we will sometimes suppress U and simply add an asterisk next to each A i indicating that the A i variables have a common cause. Proposition 1. For any representation for D, the co-causes A i will be independent of the parents of D on the original directed acyclic graph G. 24

25 Proof. This follows immediately from the fact that for any representation for D, the co-causes are functions of the random term in the non-parametric structural equation for D. The examples considered in Figures 2 and 3 have all been such that it was possible to construct determinative sets of su cient causes for D from the parents of D on the original directed acyclic graph G and the complements of such parents; no co-causes have been necessary. If it is not possible to construct a determinative set of su cient causes from the parents of D on G or if some of the su cient causes for D are unknown then it is not obvious how one might make use of Theorem 1. The theorem allowed for a su cient causation structure on a causal directed acyclic graph provided there existed some set of co-causes A 0 ; :::; A u. Theorem 2 complements Theorem 1 in that it essentially states that when D and all of its parents are binary such a set of co-causes always exists. The variables A 0 ; :::; A u are constructed from functions of the random term D in the non-parametric structural equation for D on G. Theorem 2. Consider a causal directed acyclic graph G on which there exists some node D such that D and all its parents are binary then there exist variables A 0 ; :::; A u that satisfy the conditions of Theorem 1 and such that the su cient causes constructed from A 0 ; :::; A u along with the parents of D on G and their complements are in fact minimal su cient causes. Proof. The co-causes A 0 ; :::; A u can then be constructed as follows. Let W i be the indicator 1 "D =" D (! i ). Let P i be some conjunction of the main causes and their complements i.e. P i = F i 1 :::F i n i where each F i k is either a parent of D, say E j or its complement E j. For each potential principal cause P i, let A i 1 if F i 1 :::F i n i 25

26 is a minimal su cient cause for D and A i = W j fw j : W j F i 1:::F i n i is a minimal su cient cause for Dg otherwise. Let M i = P i if A i = 1 and M i = A i P i otherwise. It must be shown that each M i = A i F1 i:::f n i i is a minimal su cient cause and that the set of M i s constitutes a minimal su cient cause representation for D (or more precisely, the set of M i s for which A i is not identically 0 constitutes a minimal su cient cause representation for D). We rst show that each M i = A i F1 i:::f n i i is a minimal su cient cause for D. Clearly this is the case if A i 1. Now consider those A i such that A i is not identically 0 and not identically 1 and suppose A i = W1 i W W ::: W i i. If A i F i 1 :::F i n i is not a minimal su cient cause then either F i 1 :::F i n i = 1 ) D = 1 or there exists j such that A i F i 1:::F i j 1F i j+1:::f i n i ) D = 1: Suppose rst that F i 1 :::F i n i = 1 ) D = 1 then there does not exist a W j such that W j F i 1 :::F i n i is a minimal su cient cause for D; but this contradicts A i is not identically 1. On the other hand, if there exists j such that A i F i 1 :::F i j 1 F i j+1 :::F i n i ) D = 1 then it is also the case that W i 1F i 1:::F i j 1F i j+1:::f i n i ) D = 1 and so W i 1 F i 1 :::F i n i is not a minimal su cient cause for D; but this contradicts A i = W i 1 W ::: W W i i : Thus A i F i 1 :::F i n i must be a minimal su cient cause for D. It remains to be shown that the set of M i s for which A i is not identically 0 constitutes 26

27 a minimal su cient cause representation for D. We must show that if D = 1 then there exists a M i = A i P i for which M i = 1. Now D is a function of (" D ; E 1 ; :::; E m ) so let (" D ; E 1 ; :::; E m) be any particular value of (" D ; E 1 ; :::; E m ) for which D = 1. Consider the set fe 1 ; :::; E m g. If for any j, " D = " D; E 1 = E 1; :::; E j 1 = E j 1; E j+1 = E j+1; :::; E m = E m ) D = 1 remove E j from fe 1 ; :::; E m g. Continue to remove those E j from this set which are not needed to maintain the implication D = 1. Suppose the set that remains is fe h1 ; :::; E hs g then either E h1 = E h 1 ; :::; E hs = E h S ) D = 1 or E h1 = E h 1 ; :::; E hs = E h S ; D = 1 and " D = " D; E h1 = E h 1 ; :::; E hs = E h S ) D = 1: If E h1 = Eh 1 ; :::; E hs = Eh S ) D = 1 then if we de ne F j as the indicator F j = 1 (Ehj =Eh j );then F 1 :::F S is a minimal su cient cause for D and there thus exists an i such that P i = F 1 :::F S and M i = P i and when E h1 = Eh 1 ; :::; E hs = Eh S we have M i = 1. If E h1 = Eh 1 ; :::; E hs = Eh S ; D = 1 but " D = " D ; E h 1 = Eh 1 ; :::; E hs = E h S ) D = 1 then if we de ne F j as the indicator 1 (Ehj =E h j ), 1 "D =" D F 1:::F S is a minimal su cient cause for D and there exists an i such that M i = A i P i and P i = F 1 :::F S and " D = " D ) A i = 1 and such that " D = " D; E h1 = E h 1 ; :::; E hs = E h S ) M i = 1: We have thus shown when D = 1 there exists an M i such that M i = 1 and so the M i s constitutes a minimal su cient cause representation for D. 27

28 The variables A i constructed in Theorem 2 along with their corresponding principal causes P i we de ne below as the canonical representation for D. Definition 11. Consider a causal directed acyclic graph G such that some node D and all of its parents are binary. Let D be the sample space for the random term D in the non-parametric structural equation for D on G. The principal causes P i = F i 1 :::F i n i, where each F i k is either a parent of D or the complement of a parent of D, along with the variables A i constructed by A i 1 if F i 1 :::F i n i is a minimal su cient cause for D and A i = W! j 2 D f1 "D =" D (! j ) : 1 "D =" D (! j )F i 1 :::F i n i is a minimal su cient cause for Dg otherwise is said to be the canonical representation for D. As noted above there will in general exist more than one set of co-causes A 0 ; :::; A u which together with the main causes and their complements can be used to construct a su cient cause representation for D. The set of A i s in the canonical representation constitutes only one particular set of variables which can be used to construct a su cient cause representation. The canonical representation in a sense "favors" principal causes with fewer terms in their conjunction. The canonical representation will never have A i = 1 for some principal cause P i when there is a principal cause P j with A j = 1 and such that the components of P j are a subset of those in the conjunction for P i. This is made more precise below in Proposition 2 and Corollary 3. As stated and proved in Theorem 2, the canonical representation will always consist of a determinative set of minimal su cient causes. This determinative set of minimal su cient causes will sometimes but not always be a non-redundant set of minimal su cient causes. That the canonical representation may have redundant minimal su cient causes is demonstrated in Example 4. Example 4. Consider a binary variable D with three binary parents, E 1, E 2 28

MINIMAL SUFFICIENT CAUSATION AND DIRECTED ACYCLIC GRAPHS. By Tyler J. VanderWeele and James M. Robins University of Chicago and Harvard University

Submitted to the Annals of Statistics MINIMAL SUFFICIENT CAUSATION AND DIRECTED ACYCLIC GRAPHS By Tyler J. VanderWeele and James M. Robins University of Chicago and Harvard University Notions of minimal