GRAPHICAL REPRESENTATION OF CAUSAL EFFECTS (CHAPTER 6) BIOS Graphical Representation

Size: px

Start display at page:

Download "GRAPHICAL REPRESENTATION OF CAUSAL EFFECTS (CHAPTER 6) BIOS Graphical Representation"

Kristina George
5 years ago
Views:

1 GRPHICL REPRESENTTION OF CUSL EFFECTS (CHPTER 6) BIOS Graphical Representation

2 Graphical representation of causal effects ( 6) Outline 6.1 Causal diagrams 6.2 Causal diagrams and marginal independence 6.3 Causal diagrams and conditional independence 6.4 Graphs, counterfactuals, and interventions 6.5 structural classification of bias 6.6 The structure of effect modification BIOS Graphical Representation

3 ( ) = =1 ( ). Consider Figure Causal diagrams L Figure 6.1 Figure 6.2 effect not mediated through any other va individual, or that we are unwilling to a do not exist. lternatively, the lack of a willingtoassume,that has no direct ca the population. For example, in Figure 6 either we know that disease severity affec transplant or that we are not willing to diagram does not distinguish whether an a protective effect. Furthermore, if, as i two causes, the diagram does not encode Causal diagrams like the one in Figu graphs, which is commonly abbreviated edges imply a direction: because the arro, but not the other way around. cy variable cannot cause itself, either direct Directed acyclic graphs have applicat we focus on causal directed acyclic graph is causal if the common causes of any p in the graph. For example, suppose in assigned to heart transplant with a pro of their disease. Then is a common included in the graph, as shown in the suppose in our study individuals are ra with the same probability regardless of a common cause of and and need n Figure 6.1 represents a conditionally ran 6.2 represents a marginally randomized e Figure 6.1 may also represent an obs Three nodes representing random variables (L,, ) and three directed edges (arrows). Time flows from left to right, and thus L is temporally prior to and, etc s in previous chapters, L,, and represent disease severity, heart transplant, and death, respectively BIOS Graphical Representation

4 6.1 Causal diagrams V W indicates there is a direct causal effect (not mediated through any other variables on the graph) for at least one individual, or unwilling to assume o/w ( ) = Lack of an arrow means that we know, or assume, that V has no direct causal effect on W for any individual Figure 6-1 unmeasured, of any pair of variables on the graph are themselves on the graph descendants. Causal DGs are of no practical use unless we make an assumption linki the DG to the data obtained in a study. This assumption, referred to as the c conditional on its direct causes, a variable is independent of any variable conditional on its parents, is independent of its non-descendants. This latter to the statement that the density ( ) of the variables in DG satisfies t L Figure 6.1 =1 ( ). effect not mediated through any other va individual, or that we are unwilling to a do not exist. lternatively, the lack of a willingtoassume,that has no direct ca the population. For example, in Figure 6 either we know that disease severity affec transplant or that we are not willing to diagram does not distinguish whether an disease severity affects the probability of receiving a heart transplant (or we are not willing to assume otherwise) a protective effect. Furthermore, if, as i two causes, the diagram does not encode Causal diagrams like the one in Figu graphs, which is commonly abbreviated edges imply a direction: because the arro, but not the other way around. cy variable cannot cause itself, either direct Directed acyclic graphs have applicat we focus on causal directed acyclic graph is causal if the common causes of any p in the graph. For example, suppose in assigned to heart transplant with a pro BIOS Graphical Representation

5 Direct cyclic Graphs (DGs) 6.1 Causal Diagrams Directed: L implies L causes but not the other direction cyclic: there are no cycles; a variable cannot cause itself, either directly or through other variables Causal DG is a DG where common causes of any pair of variables in the graph is also on the graph BIOS Graphical Representation

6 Technical Point 6.1 DG G is a graph whose nodes (vertices) are random variables V = (V 1,...,V m ) with directed edges (arrows) and no directed cycles ( ) = P m parents of V m, i.e., set of nodes from which there is a direct arrow into V m Figure 6.1: descendants. Causal DGs are of no practical use unless we make an assumption linki the DG to the data obtained in a study. This assumption, referred to as the c conditional on its direct causes, a variable is independent of any variable conditional on its parents, is independent of its non-descendants. This latter to the statement that the density ( ) of the variables in DG satisfies t L Figure 6.1 Let V 1 = L, V 2 =, V 3 =. Then P 3 = {, } =1 ( ). effect not mediated through any other va individual, or that we are unwilling to a do not exist. lternatively, the lack of a willingtoassume,that has no direct ca the population. For example, in Figure 6 either we know that disease severity affec transplant or that we are not willing to diagram does not distinguish whether an a protective effect. Furthermore, if, as i two causes, the diagram does not encode Causal diagrams like the one in Figu graphs, which is commonly abbreviated edges imply a direction: because the arro, but not the other way around. cy variable cannot cause itself, either direct Directed acyclic graphs have applicat we focus on causal directed acyclic graph is causal if the common causes of any p in the graph. For example, suppose in assigned to heart transplant with a pro of their disease. Then is a common included in the graph, as shown in the BIOS Graphical Representation

7 Technical Point 6.1 Two vertices joined by an edge are said to be adjacent path between X and consists of a sequence of distinct vertices that are adjacent X B C D V m is a descendant of V j (and V j is an ancestor of V m ) if there is a directed path V j V m BIOS Graphical Representation

8 Technical Point 6.1 Notational convention: if m > j then V m is not an ancestor of V j G is a causal DG if 1. Lack of arrow from V j to V m can be interpreted as the absence of a direct causal effect of V j on V m (relative to the other variables on the graph) 2. ll common causes, even if unmeasured, of any pair of variables on the graph are themselves on the graph 3. ny variable is a cause of its descendants BIOS Graphical Representation

9 Technical Point 6.1 Causal Markov assumption, which links DG to the observable random variable, states: Conditional on its direct causes, a variable V j is independent of any variable for which it is not a cause Conditional on its parents, V j is independent of its non-descendants Distribution f (V ) of variables V in DG G satisfies the Markov factorization f (v) = M j=1 f (v j pa j ) BIOS Graphical Representation

10 ( ) = =1 ( ). Figure Causal Diagrams L Figure 6.1 Figure 6.2 effect not mediated through any other va individual, or that we are unwilling to a do not exist. lternatively, the lack of a willingtoassume,that has no direct ca the population. For example, in Figure 6 either we know that disease severity affec transplant or that we are not willing to diagram does not distinguish whether an a protective effect. Furthermore, if, as i two causes, the diagram does not encode Causal diagrams like the one in Figu graphs, which is commonly abbreviated edges imply a direction: because the arro, but not the other way around. cy variable cannot cause itself, either direct Directed acyclic graphs have applicat we focus on causal directed acyclic graph is causal if the common causes of any p in the graph. For example, suppose in assigned to heart transplant with a pro of their disease. Then is a common included in the graph, as shown in the suppose in our study individuals are ra with the same probability regardless of a common cause of and and need n Figure 6.1 represents a conditionally ran 6.2 represents a marginally randomized e Figure 6.1 may also represent an obs Could represent (i) a conditional randomized experiment or (ii) observational study where we assume conditional exchangeability a L Note how the DG does not includes a. This disconnect will be ignored for now; formally incorporating potential outcomes into graphs via SWIGs (single world intervention graphs) will be discussed later BIOS Graphical Representation

11 Figure Causal Diagrams Figure 6.2 Directed acyclic graphs have applicat we focus on causal directed acyclic graph is causal if the common causes of any p in the graph. For example, suppose in assigned to heart transplant with a pro of their disease. Then is a common included in the graph, as shown in the suppose in our study individuals are ra with the same probability regardless of a common cause of and and need n Figure 6.1 represents a conditionally ran 6.2 represents a marginally randomized e Could represent (i) a marginally randomized experiment or (ii) observational study where we were willing to assume (unconditional) exchangeability a Below we discuss key results from graph theory that allows us to read off certain conditional independencies from a DG Heuristically, in Figure 6.2 we would expect and to be associated Figure 6.1 may also represent an obs BIOS Graphical Representation

12 Causal Diagrams and Marginal Independence Let L denote cigarette smoking, carrying a lighter, and lung cancer ssume does not affect Figure Causal diagrams and marginal independence L Figure 6.3 path between two variables and in a DG is a route that connects and by following a sequence Consider the following two examples. Fir has a preventive causal effect on the ri 1] 6= Pr[ =0 =1]. The causal diagram tion of this knowledge for an experiment unconditionally, assigned. Second, suppo has no causal effect (causative or prevent i.e., Pr[ =1 =1]=Pr[ =0 =1], and t effect on both carrying a lighter and l Figure 6.3 is the graphical translation of between and indicates that carrying on lung cancer; is depicted as a comm To draw Figures 6.2 and 6.3 we only u relations among the variables in the dia diagrams also encode information about Continuing with heuristic/intutive arguments, we might expect for this DG that and are associated due to common cause L exactly, the lack of them) among the var heuristically that, in general, the variabl Figure 6.2 and 6.3, and describe key rela Take first the randomized experiment one would expect that two variables would be associated. nd that is exac one knows that has a causal effect on also generally expect and to be ass with the fact that, in an ideal randomize changeability, causation Pr[ =1 =1] Pr[ =1 =1]6= Pr[ =1 =0], a tures the causation-association correspon alization of the paths between two varia Eg, suppose L is Louise s hair color, and and are the hair color of her two children lice and usuf association flows. ssociation, unlike ca between two variables; thus, when presen BIOS Graphical Representation

13 6.2 Causal Diagrams and Marginal Independence genetic haplotype, cigarette smoking, L heart disease ssume does not affect Figure 6.4 leads to the same methods described here, we do not include decision nodes in chapter. Because we were always explicit about the potential interventions on t represent the potential interventions) would be somewhat redundant. Figure 6.4 L terms, and are associated because th (or, equivalently, from to ) throug Let us now consider a third example. S haplotype has no causal effect on an smoker, i.e., Pr[ =1 =1]=Pr[ =0 and cigarette smoking have a causal The causal diagram in Figure 6.4 is the gr The lack of an arrow between and i have a causal effect on cigarette smoking, of and. In graph theory the commo on the path because two arro gain the question is whether an question, imagine that another investiga lotype on the risk of becoming a cigare is no effect but this is unknown to the in minations on a large number of children up becoming smokers. pollo is one of t pollo does not have the haplotype ( come a cigarette smoker ( =1) than th haplotype does not improve our abili the risk in those with ( =1)andwitho or Pr[ =1 =1]=Pr[ =1 =0]. conclude that and are not associat q. The knowledge that both and when considering the association betwee firms our intuition because it says that c the flow of association along the path o independent because the only path betw Lack of arrow from to reflects assumption does not affect L is a common effect of and L is a collider on the path L because two arrowheads collide on this node Note the concept of collider is path specific BIOS Graphical Representation

14 6.2 Causal Diagrams and Marginal Independence Figure 6.4 Figure 6.4 Intuitively, we might expect that and are independent L terms, and are associated because th (or, equivalently, from to ) throug Let us now consider a third example. S haplotype has no causal effect on an smoker, i.e., Pr[ =1 =1]=Pr[ =0 and cigarette smoking have a causal The causal diagram in Figure 6.4 is the gr The lack of an arrow between and i have a causal effect on cigarette smoking, of and. In graph theory the commo on the path because two arro gain the question is whether an question, imagine that another investiga lotype on the risk of becoming a cigare is no effect but this is unknown to the in minations on a large number of children up becoming smokers. pollo is one of t pollo does not have the haplotype ( come a cigarette smoker ( =1) than th haplotype does not improve our abili the risk in those with ( =1)andwitho or Pr[ =1 =1]=Pr[ =1 =0]. conclude that and are not associat q. The knowledge that both and when considering the association betwee firms our intuition because it says that c the flow of association along the path o independent because the only path betw the collider. In summary, two variables are (mar other, or if they share common causes. O dependent. The next section explores the Graph theory in fact confirms this intuition: colliders block the flow of association along the path on which they lie Eg, suppose and denote hair color of braham and olanda, and L denotes the hair color of their son Lionel BIOS Graphical Representation

15 6.2 Causal Diagrams and Marginal Independence Heuristic summary: Two variables are (marginally) associated if 1. one causes the other (Figure 6.2) 2. they share common causes (Figure 6.3) Otherwise they will be (marginally) independent (Figure 6.4) The next section explores the conditions under which two variables and may be independent conditionally on a third variable L BIOS Graphical Representation

16 6.3 Causal diagrams and conditional independence Consider in Figure 6.2 we had where aspirin affects the risk of death Suppose this is because aspirin reduces platelet aggregration B Is there an association between and conditional on B? Figure B Figure 6.5 Because no conditional independences are expected in complete causal diagrams (those in which all possible arrows are present), it is often said that information about associations is in the missing arrows. associated because aspirin has a causal we obtain an additional piece of informat because it reduces platelet aggregation into the causal diagram of Figure 6.5 t high, 0: low) as a mediator of the effect Once a third variable is introduced in question: is there an association between on)? Or, equivalently: when we alread mation about improve our ability to suppose data were collected on,, an and that we restrict the analysis to the s aggregation ( =0). The square box pl represents this restriction. (We would als were restricted to the subset of individu Individuals with low platelet aggregat risk of heart disease. Now take one of the the individual was treated ( =1)or that he has a lower than average risk be In fact, because aspirin use affects hea aggregation, learning an individual s trea additional information to predict his risk individuals with =0,treatment and The square box around B in Figure 6.5 indicates we are conditioning on B BIOS Graphical Representation

17 6.3 Causal diagrams and conditional independence If we assume aspirin use affects heart disease risk only through platelet aggregation, learning an individuals treatment status does not contribute any additional information to predict his risk of heart disease That is, B, even though and are marginally associated Indeed, graph theory states that a box placed around variable B blocks the flow of association through the path B BIOS Graphical Representation

18 causal diagrams (those in which all possible arrows are present), it is often said that information about as- 6.3 Causal diagrams sociations and is inconditional the missing arrows. independence Now consider Figure 6.3 where L Here carrying a lighter is associated with lung cancer because of L cigarette smoking Suppose we restrict study to non-smokers L = 0 Figure 6.6 L Figure 6.6 Blocking the flow of association between treatment and outcome through the common cause is the graph-based justification to use stratification as a method to achieve exchangeability. risk of heart disease. Now take one of th the individual was treated ( =1)or that he has a lower than average risk b In fact, because aspirin use affects hea aggregation, learning an individual s tre additional information to predict his risk individuals with =0,treatment an same informal argument can be made for Even though and are marginally a independent (unassociated) given be same in the treated and the untreated 1 = ] =Pr[ =1 =0 = ] graph theory states that a box placed association through the path Let us now return to Figure 6.3. We carrying a lighter was associated wi the path was open to the question we ask now is whether is ass Conditioning on L blocks the flow of association between and new question is represented by the box investigator restricts the study to nonsm that an individual carries a lighter ( lung cancer ( =1) because the entire on the fact that people carrying lighter argument is irrelevant when the study generally, to people who smoke with a and are marginally associated, and because the risk of lung cancer is the within levels of : Pr[ =1 =1 =. Thatis, q. Graphically, we sa and is interrupted because the pat around. Finally, consider Figure 6.4 again. that having the haplotype was inde because the path between and BIOS Graphical Representation

19 6.3 Causal diagrams and conditional independence That is, L even though and are marginally associated Eg, once we know you are a non-smoker, whether you are carrying a lighter tells us nothing about your risk of lung cancer Blocking the flow of association between treatment and outcome through the common cause is the graph-based justification to use stratification as a method to achieve exchangeability BIOS Graphical Representation

20 6.3 Causal diagrams and conditional independence Recall in Figure 6.4 we had L where haplotype, cigarette smoker, and L heart disease Suppose we condition on L Figure 6.7 Graphical representation of causal effects Figure 6.7 Figure 6.8 L L C disease ( =1). The square around in conditioning on a particular value of. disease lacks haplotype provides some in because, in the absence of, it is more as is present. That is, among people w smokers is increased among those without are inversely associated conditionally o a mistake if he concludes that has a ca are associated within levels of. In th causes of, then among people with hear would perfectly predict the presence of t indeed conditioning on a collider like o was blocked when the collider was not c two variables (the causes) are associated inthefuture(theireffect), but two cause associated once we stratify on the commo s another example, the causal diagram 6.7 a diuretic medication whose use is a disease. and are also associated withi effect of and. Graph theory shows affected by a collider also opens the pat Consider extreme scenario where and are only known causes of L Suppose L = 1, i.e., See thechapter person8 has for more heart ondisease. associations due to conditioning on com- Suppose also = 0, i.e., they lack themon haplotype effects. that increases susceptbility to heart disease. Then it is must be that the other cause of heart disease is present, i.e., = 1. I.e., L BIOS Graphical Representation

21 6.3 Causal diagrams and conditional independence Graph theory shows that indeed conditioning on a collider like L opens the path L, which was blocked when the collider was not conditioned on Intuitively, whether two variables (the causes) are associated cannot be influenced by an event in the future (their effect), but two causes of a given effect generally become associated once we stratify on the common effect Eg given Lionel has red hair, then if braham does not have red hair it is more likely that olanda does have red hair BIOS Graphical Representation

22 See Chapter 8 for more on associations due to conditioning on common effects. 6.3 Causal diagrams and conditional independence similar example is given in Figure 6.8 Figure 6.8 L C The mathematical theory underlying the graphical rules is known as d-separation (Pearl 1995). See Fine Point 6.3. Figure 6.9 S are associated within levels of. In th causes of, then among people with hea would perfectly predict the presence of t indeed conditioning on a collider like was blocked when the collider was not c two variables (the causes) are associate inthefuture(theireffect), but two caus associated once we stratify on the commo s another example, the causal diagra 6.7 a diuretic medication whose use is disease. and are also associated with effect of and. Graph theory show affected by a collider also opens the pa in the absence of conditioning on either t This and the previous section review t ables may be associated: one causes the they share a common effect and the analy common effect. long the way we introdu can be applied to any causal diagram to (conditionally) independent. The argume cal rules were heuristic and relied on our however, have been formalized and mathe for a systematic summary of the graphica There is another possible source of as we have not discussed yet: chance or rand reasons for an association between two v other, shared common causes, conditioni ability results in chance associations that study population increases. To focus our discussion on structural ciations,wecontinuetoassumeuntilchap every individual in a very large (perhaps Suppose C is a medicine prescribed as a consequence of heart disease L Then conditional on C, there will be an association between and Graph theory shows that conditioning on a variable C affected by a collider L also opens the path L whereas this path is blocked (i.e., there is no association between L and ) in the absence of conditioning on either the collider L or its descendant C BIOS Graphical Representation

23 Example (R): Conditioning on descendant of a collider > set.seed(1) > n < > a <- runif(n) > y <- runif(n) > l <- (a-y)^2 > c <- rbinom(n,1,l) > cor.test(a,y) p-value = > cor.test(a[c==0],y[c==0]) p-value < 2.2e-16 > cor.test(a[c==1],y[c==1]) p-value < 2.2e-16 BIOS Graphical Representation

24 6.3 Causal diagrams and conditional independence These examples have demonstrated heuristically three structural reasons why two variables may be associated 1. One causes the other 2. They share common causes 3. They share a commone effect and the analysis is restricted to a certain level of that common effect The formal underpinnings of these heuristic arguments are based on the notion of d-separation BIOS Graphical Representation

25 Fine Point 6.3: d-separation Key result: Suppose we have a DG G and a distribution over its nodes where each varibale is independent of its non-descendants conditional on its parents (Markov factorization) If two variables (e.g., and ) are d-separated given some other variable (e.g., L), then then the two variables are conditionally independent given the third L Notion extends to sets of variables { 1, 2...,}, { 1, 2,...} and {L 1,L 2,...} What is d-separation? First we need to define paths as open or blocked BIOS Graphical Representation

26 Fine Point 6.3: d-separation path is open or closed according to the following four rules 1. If there are no variables being coniditioned on, a path is blocked Graphical 74 representation of causal effects iff it contains a collider Eg, L is open where L is blocked disease associated ( =1). because The aspirin square has around a causa conditioning we obtain anon additional a particular piecevalue of inform of because it reduces platelet aggregati affected by a collider also opens the p was inindividuals the blocked absence with when of conditioning the =0,treatment collider on waseither not a Figure 6.7 disease lacks haplotype provides some because, into the in causal the absence diagramof of, Figure it is mo 6.5 B as high, is0: present. low) as That a mediator is, among the peopl effec smokers Onceis aincreased third variable among is introduced those witho Graphical representation question: is there an association betwe Figure 6.5 of causal effects are inversely associated conditionally aon) mistake? Or, if he equivalently: concludes that when we has alre a See Chapter 8 for more on associations due to conditioning on common mation are associated about within improve levels our of ability. Int disease causes suppose of ( data, =1). then were The among collected square people on around with,, he effects. conditioning would and that perfectly weon restrict apredict particular the the analysis presence valuetoofth o L disease indeed aggregation lacks conditioning haplotype ( =0). onthe a provides collider square like some box Figure 6.7 Because no conditional independences because, was represents blocked in this the when absence restriction. the collider of, (We itwas would is mor not as two were variables isrestricted present. (the to That the causes) is, subset among areofassocia individ people are expected in complete causal diagrams (those in which all possible arrows are present), it is oftenchapter smokers inthefuture(theireffect), Individuals is increased withamong low platelet those but two aggreg witho cau associated risk areof inversely heart once disease. associated we stratify Nowconditionally take the onecomm of t athe mistake s individual another if example, concludes was treated the that causal ( =1)o has diag a said that8 information more onabout associ- as- 6.7 that are a diuretic he associated has amedication lower within than levels average whose of. risk use In it sociations See due toisconditioning in the missing L onarrows. common C causes disease. In fact, of, because and then among are aspirin alsopeople associated use affects with wi he h effects. Figure 6.8 would effect aggregation, perfectly of and learning predict. Graph anthe individual s presence theory sho of tr indeed additional conditioning information on ato collider predict his likeris 2. ny path which contains a non-colliders L that which been conditioned on is blocked, eg, as in Figure collider that has been conditioned on does not block a path 4. collider that has a descendent that has been conditioned on does not block a path BIOS Graphical Representation

27 Fine Point descendants. 6.3: d-separation In summary: path is blocked iff (i) it contains a noncolllider that has been conditioned on or (ii) it contains a collider which has not been conditioned on and has no descendenant that have been conditioned on Two variables are d-separated if all paths between them are blocked Otherwise the two variables are d-connected Consider Figure 6-1 ( ). We will adopt the notational convention that if, is not an an causal DG is a DG in which 1) the lack of an arrow from node to of a direct causal effect of on (relative to the other variables on the g unmeasured, of any pair of variables on the graph are themselves on the graph Causal DGs are of no practical use unless we make an assumption linki the DG to the data obtained in a study. This assumption, referred to as the c conditional on its direct causes, a variable is independent of any variable conditional on its parents, is independent of its non-descendants. This latter to the statement that the density ( ) of the variables in DG satisfies t L Figure 6.1 re and d-separated or d-connected conditional on L? ( ) = =1 ( ). effect not mediated through any other va individual, or that we are unwilling to a do not exist. lternatively, the lack of a willingtoassume,that has no direct ca the population. For example, in Figure 6 either we know that disease severity affec transplant or that we are not willing to diagram does not distinguish whether an a protective effect. Furthermore, if, as i two causes, the diagram does not encode Causal diagrams like the one in Figu graphs, which is commonly abbreviated edges imply a direction: because the arro, but not the other way around. cy variable cannot cause itself, either direct Directed acyclic graphs have applicat we focus on causal directed acyclic graph is causal if the common causes of any p BIOS Graphical Representation

28 Consider Figure 6-6 Fine Point 6.3: d-separation L Figure 6.6 re and d-separated or d-connected conditional on L? Blocking the flow of association between treatment and outcome through the common cause is the graph-based justification to use stratification as a method to achieve exchangeability. Even though and are marginally a independent (unassociated) given be same in the treated and the untreated 1 = ] =Pr[ =1 =0 = ] graph theory states that a box placed association through the path Let us now return to Figure 6.3. We carrying a lighter was associated wi the path was open to the question we ask now is whether is ass new question is represented by the box investigator restricts the study to nonsm that an individual carries a lighter ( lung cancer ( =1) because the entire on the fact that people carrying lighter argument is irrelevant when the study generally, to people who smoke with a and are marginally associated, and because the risk of lung cancer is the within levels of : Pr[ =1 =1 =. Thatis, q. Graphically, we sa and is interrupted because the pat around. Finally, consider Figure 6.4 again. that having the haplotype was inde because the path between and collider. We now argue heuristicall conditionally associated within levels of the investigators, who are interested in on smoking status, restricted the stu So what can we conclude about the association between and? BIOS Graphical Representation

29 Fine Point 6.2: Faithfulness bsence of an arrow from to indicates that the sharp null hypothesis of no causal effect of on any individual s holds rrow from to indicates has causal effect on of at least one individual in the population Suppose has a positive effect on for half the population =1 and a negative effect for the other half of the population such that the overall effect of on is zero Then the DG Figure 6.1. In this DG, =3and we can choose 1 =, 2 =, and 3 ( ). We will adopt the notational convention that if, is not an an causal DG is a DG in which 1) the lack of an arrow from node to of a direct causal effect of on (relative to the other variables on the g unmeasured, of any pair of variables on the graph are themselves on the graph descendants. Causal DGs are of no practical use unless we make an assumption linki the DG to the data obtained in a study. This assumption, referred to as the c conditional on its direct causes, a variable is independent of any variable conditional on its parents, is independent of its non-descendants. This latter to the statement that the density ( ) of the variables in DG satisfies t L Figure 6.1 ( ) = ( ). effect not mediated through any other va individual, or that we are unwilling to a do not exist. lternatively, the lack of a willingtoassume,that has no direct ca the population. For example, in Figure 6 either we know that disease severity affec transplant or that we are not willing to diagram does not distinguish whether an is correct in the sense that has a causal effect on for at least one individual in the population, and yet the average causal effect is zero a protective effect. Furthermore, if, as i two causes, the diagram does not encode Causal diagrams like the one in Figu graphs, which is commonly abbreviated edges imply a direction: because the arro, but not the other way around. cy variable cannot cause itself, either direct Directed acyclic graphs have applicat BIOS Graphical Representation we focus on causal directed acyclic graph

30 Fine Point 6.2: Faithfulness Faithfulness is the assumption that this does not occur I.e., that if there is an arrow from to, i.e., there is an effect of on for at least one individual, then there is an average effect of on This assumption might be reasonable if we think perfect canelleations like on the previous slide are unlikely There are instances however where faithfullness is violated by design BIOS Graphical Representation

31 Figure 6.8 Fine Point 6.2: Faithfulness disease. and are also associated with effect of and. Graph theory show affected by a collider also opens the pa in the absence of conditioning on either This and the previous section review ables may be associated: one causes the they share a common effect and the analy Consider example described in 4.5 where we match on L in order The mathematical theory underlyingeffect the graphical of on rules is known to estimate the causal as d-separation (Pearl 1995). Let S = 1 indicate selection Fine Point into 6.3. the matched cohort, S = 0 otherwise L Figure 6.9 S See common effect. long the way we introd can be applied to any causal diagram to (conditionally) independent. The argum cal rules were heuristic and relied on ou however, have been formalized and math for a systematic summary of the graphic There is another possible source of a we have not discussed yet: chance or ran reasons for an association between two other, shared common causes, condition ability results in chance associations tha study population increases. To focus our discussion on structural ciations,wecontinuetoassumeuntilcha ccording to d-separation rules, there are two open paths between and L when conditioning on S, namely L and L S every individual in a very large (perhaps Thus one would expect L and to be associated conditional on S 6.4 Graphs, counterfactuals, and interventions However, matching ensures L and are not associated. Why? Unfaithfulness, i.e., perfect cancellation of assocations Pearl (2009) reviews quantitative methods for causal inference that are derived from graph theory. Causal diagrams encode qualitative expe the causal structure of a problem and h biases. Though causal diagrams are a us a causal inference problem, quantitativ causal effects. The identification formula in Chapter 2 can also be derived using our choice of counterfactual theory in Ch BIOS Graphical Representation

32 6.4 Graphs, counterfactuals, and interventions Causal diagrams encode qualitative expert knowledge, or assumptions, about the causal structure of a problem and hence about the causal determinant of biases Though causal diagrams are a useful tool to think conceptually about a causal inference problem, quantitative approaches are needed to draw inference causal effects Variables sufficient for d-separation between exposure and outcome can be used in conjunction with standardization and inverse probability weighting for such inference BIOS Graphical Representation

33 6.5 structural classification of bias Structural bias: Bias due to structural reasons There is structural bias whenever the causal effect measure and the corresponding association measure are not equal, e.g., E[ a=1 ] E[ a=0 ] E[ = 1] E[ = 0] bove is unconditional bias; can define conditional bias (given L) analogously Under faithfulness, presence of conditional bias implies unconditional bias Without the faithfulness assumption, positive bias in one stratum of L could cancel the negative bias in another stratum BIOS Graphical Representation

34 6.5 structural classification of bias Three forms of structural bias 1. Confounding: Treatment and outcome share a common cause ( 7) 2. Selection bias: Conditioning on common effects ( 8) 3. Measurement error: Bias induced by mis-measuring, L, and/or ( 9) The next three chapters will utilize DGs to examine these structural biases BIOS Graphical Representation

35 6.6 The structure of effect modification Causal DGs are helpful in helping determine potential sources of structural bias DGs are less helpful 6.6 in depicting The structure effectof modification effect modification E.g., consider Figure 6.10 M Figure 6.10 where M is quality of care, is heart transplant, is survival like that. The remaining chapters of Pa fuzzy boundary between experimenting a three chapters we turn our attention to t three classes of biases: bias due to the to the selection of individuals, and bias Before that, we take a brief detour to des of effect modification. Identifying potential sources of bias is a use our causal expert knowledge to draw association between treatment and outco to illustrate the concept of effect modific Suppose heart transplant wasran identify the average causal effect of on that there is no bias, and thus Figure Computing the effect of on the risk association is causation, the association Pr [ =1 =0] can be interpreted as 1] Pr[ =0 =1]. The investigators, how suspect that the causal effect of heart tran care offered in each hospital participating This DG is correct regardless of whether M is an effect modifier N Note if we were constructing a DG to assess the effect of on, we would not need to M include M on the DG b/c M is not a Figure 6.11 common cause of and BIOS Graphical Representation

Graphical Representation of Causal Effects. November 10, 2016

Graphical Representation of Causal Effects November 10, 2016 Lord s Paradox: Observed Data Units: Students; Covariates: Sex, September Weight; Potential Outcomes: June Weight under Treatment and Control;