OUTLINE CAUSAL INFERENCE: LOGICAL FOUNDATION AND NEW RESULTS Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea/) Statistical vs. Causal vs. Counterfactual inference: syntax and semantics The logical equivalence of Structural Equation Models (SEM) and potential outcomes (Neyman-Rubin) Where do they differ? 1. Define, 2. Assume, 3. identif 4. Estimate, 5. Test Which covariates should be measured? Causation without manipulation The Mediation Formula and its ramifications Measurement bias and effect restoration. 1 2 3-LEVEL HIERARCH OF CAUSAL MODELS TRADITIONAL STATISTICAL INFERENCE PARADIGM 1. Probabilistic Knowledge y x) Bayesian networks, graphical models P Q(P) (Aspects of P) 2. Interventional Knowledge y do(x)) Causal Bayesian Networks (CBN) (Agnostic graphs, manipulation graphs) Inference 3. Counterfactual Knowledge x = x =y ) Structural equation models, physics, functional graphs, Treatment assignment mechanism 3 e.g., Infer whether customers who bought product A would also buy product B. Q = B A) 4 FROM STATISTICAL TO CAUSAL ANALSIS: 1. THE DIFFERENCES Probability and statistics deal with static relations P change Inference P Q(P ) (Aspects of P ) What happens when P changes? e.g., Infer whether customers who bought product A would still buy A if we were to double the price. 5 FROM STATISTICAL TO CAUSAL ANALSIS: 1. THE DIFFERENCES What remains invariant when P changes sa to satisfy P (price=2)=1 P change P Inference Note: P (v) P (v price = 2) P does not tell us how it ought to change Causal knowledge: what remains invariant Q(P ) (Aspects of P ) 6 1
FROM STATISTICAL TO CAUSAL ANALSIS: 1. THE DIFFERENCES (CONT) 1. Causal and statistical concepts do not mix. CAUSAL STATISTICAL Spurious correlation Regression Randomiation / Intervention Association / Independence Confounding / Effect Controlling for / Conditioning Instrumental variable Odd and risk ratios Exogeneity / Ignorability Collapsibility / Granger causality Mediation Propensity score 2. 3. 4. FROM STATISTICAL TO CAUSAL ANALSIS: 2. MENTAL BARRIERS 1. Causal and statistical concepts do not mix. CAUSAL STATISTICAL Spurious correlation Regression Randomiation / Intervention Association / Independence Confounding / Effect Controlling for / Conditioning Instrumental variable Odd and risk ratios Exogeneity / Ignorability Collapsibility / Granger causality Mediation Propensity score 2. No causes in no causes out (Cartwright, 1989) statistical assumptions + data causal assumptions causal conclusions 3. Causal assumptions cannot be expressed in the mathematical language of standard statistics. 4. } 7 8 FROM STATISTICAL TO CAUSAL ANALSIS: 2. MENTAL BARRIERS BAESIAN NETWORKS 1. Causal and statistical concepts do not mix. CAUSAL STATISTICAL Spurious correlation Regression Randomiation / Intervention Association / Independence Confounding / Effect Controlling for / Conditioning Instrumental variable Odd and risk ratios Exogeneity / Ignorability Collapsibility / Granger causality Mediation Propensity score 2. No causes in no causes out (Cartwright, 1989) statistical assumptions + data causal assumptions causal conclusions 3. Causal assumptions cannot be expressed in the mathematical language of standard statistics. 4. Non-standard mathematics: a) Structural equation models (Wright, 1920; Simon, 1960) b) Counterfactuals (Neyman-Rubin ( x ), Lewis (x )) } 9 Definition P is Markov relative to G iff: V i NDi PAi Algebraic Representation P ( vi,..., vn) = vi pai ) i Graphical Representation PA i ( ) G ( ) P ( d - separation conditional independence) V i G 10 CAUSAL BAESIAN NETWORK (CBN) (Agnostic graphs, manipulation graphs) Definition Let P x (v) stand for v do(=x)), and x arbitrary G is a CBN relative to P x * iff: i. P x (v) is Markov relative to G; ii. P x (v i )=1 for all V i if v i agrees with =x. iii. P x( vi pai ) = vi pai ) for all V i whenever pa i agrees with = i.e., each v i pa i ) remains INVARIANT to interventions not involving V i. Algebraic representation Px ( v) = vi pai ) for all v consistent with x. { i V i } (Truncated product, G-computation, manipulation theorem) Graphical representation Surgery: remove incoming arrows to. 11 12 THE STRUCTURAL MODEL PARADIGM Inference Generating Model M Q(M) (Aspects of M) M Invariant strategy (mechanism, recipe, law, protocol) by which Nature assigns values to variables in the analysis. Think Nature, not experiment! 2
STRUCTURAL CAUSAL MODELS Definition: A structural causal model is a 4-tuple V,U, F,, where V = {V 1,...,V n } are endogeneas variables U ={U 1,...,U m } are background variables F = {f 1,..., f n } are functions determining V, v i = f i (v, e.g., y = α + βx + u is a distribution over U and F induce a distribution v) over observable variables CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: would be y (in unit, had been denoted x ( = means: The solution for in a mutilated model M x, (i.e., the equations for replaced by = x) with input U=u, is equal to y. ( U M ( = x U M x ( 13 14 CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: would be y (in unit, had been denoted x ( = means: The solution for in a mutilated model M x, (i.e., the equations for replaced by = x) with input U=u, is equal to y. The Fundamental Equation of Counterfactuals: x ( = M ( x 15 CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: would be y (in situation, had been denoted x ( = means: The solution for in a mutilated model M x, (i.e., the equations for replaced by = x) with input U=u, is equal to y. probabilities of counterfactuals: x = w = = u: x ( = w( = In particular: y do(x) ) Δ = x = = u: x ( = y PN( x' = y' = u u: x ' ( = y' 16 TWO PARADIGMS FOR CAUSAL INFERENCE Observed:,,,...) Conclusions needed: x =, y =x =... ARE THE TWO PARADIGMS EQUIVALENT? es (Galles and Pearl, 1998; Halpern 1998) How do we connect observables,,,, to counterfactuals x,, y,? N-R model Counterfactuals are primitives, new variables Super-distribution P *(,,...,,...),, constrain... Structural model Counterfactuals are derived quantities Subscripts modify the model and distribution x = = PM ( = x 17 In the N-R paradigm, x is defined by consistency: = x1 + ( 1 x) 0 In SCM, consistency is a theorem. Moreover, a theorem in one approach is a theorem in the other. 18 3
THE FIVE NECESSAR STEPS OF CAUSAL ANALSIS THE FIVE NECESSAR STEPS FOR EFFECT ESTIMATION Define: Express the target quantity Q as a function Q(M) that can be computed from any model M. Assume: Formulate causal assumptions A using some formal language. Identify: Determine if Q is identifiable given A. Estimate: Estimate Q if it is identifiable; approximate it, if it is not. Test: Test the testable implications of A (if an. 19 Define: Express the target quantity Q as a function Q(M) that can be computed from any model M. ATE Δ = E( do( x1)) do( x0)) Assume: Formulate causal assumptions A using some formal language. Identify: Determine if Q is identifiable given A. Estimate: Estimate Q if it is identifiable; approximate it, if it is not. Test: Test the testable implications of A (if an. 20 FORMULATING ASSUMPTIONS THREE LANGUAGES IDENTIFING CAUSAL EFFECTS IN POTENTIAL-OUTCOME FRAMEWORK 1. English: Smoking (), Cancer (), Tar (), Genotypes (U) 2. Counterfactuals: Not too friendly: x consistent? complete? redundant? arguable? 3. Structural: x( = yx(, y( = y( = ( = (, ( = x(, {, } 21 Define: Assume: Identify: Estimate: Express the target quantity Q as a counterfactual formula, e.g., E((1) (0)) Formulate causal assumptions using the distribution: P * =,, (1), (0)) Determine if Q is identifiable using P* and =x (1) + (1 x) (0). Estimate Q if it is identifiable; approximate it, if it is not. 22 GRAPHICAL COUNTERFACTUALS SMBIOSIS Every causal graph expresses counterfactuals assumptions, e.g., 1. Missing arrows x, ( = x( 2. Missing arcs x y consistent, and readable from the graph. IDENTIFICATION IN SCM Find the effect of on, y do(x)), given the causal assumptions shown in G, where 1,..., k are auxiliary variables. 3 1 4 G 2 5 Express assumption in graphs Derive estimands by graphical or algebraic methods 23 6 Can y do(x)) be estimated if only a subset,, can be measured? 24 4
ELIMINATING CONFOUNDING BIAS THE BACK-DOOR CRITERION y do(x)) is estimable if there is a set of variables such that d-separates from in G x. 1 G 2 1 G x 2 EFFECT OF WARM-UP ON INJUR (After Shrier & Platt, 2008) Watch out! Front??? Door 3 4 6 3 5 5 4 6 Moreover, y do( x)) = y = ( adjusting for ) Ignorability x 25 No, no! Warm-up Exercises () Injury () 26 CONFOUNDING EQUIVALENCE EQUALL VALUABLE CONFOUNDING EQUIVALENCE EQUALL VALUABLE L W 1 W 4 T W 2 W 3 V 1 V 2? T 27 Definition: T and are c-equivalent if P ( y t) t) = y y t Definition (Markov boundar: Markov boundary S m of S (relative to ) is the minimal subset of S that d-separates from all other members of S. Theorem (Pearl and Pa, 2009) and T are c-equivalent iff 1. m =T m, or 2. and T are admissible (i.e., satisfy the back-door condition) 28 CONFOUNDING EQUIVALENCE EQUALL VALUABLE CONFOUNDING EQUIVALENCE EQUALL VALUABLE L W 1 W 4 T W 1 W 4 T W 2 W 3 T W 2 W 3 T V 1 V 2 V 1 V 2 29 30 5
CONFOUNDING EQUIVALENCE EQUALL VALUABLE BIAS AMPLIFICATION B INSTRUMENTAL VARIABLES T W 2 W 1 W 1 W 4 W 1 {W 1, W 2 } W 2 W 3 T U V 1 V 2 31 Adding W 2 to Propensity Score increases bias (if such exists) (Wooldridge, 2009) In linear systems always In non-linear systems almost always (Pearl, 2010) Outcome predictors are safer than treatment predictors 32 EFFECT DECOMPOSITION (direct vs. indirect effects) WH DECOMPOSE EFFECTS? 1. Why decompose effects? 2. What is the definition of direct and indirect effects? 3. What are the policy implications of direct and indirect effects? 4. When can direct and indirect effect be estimated consistently from experimental and nonexperimental data? 33 1. To understand how Nature works 2. To comply with legal requirements 3. To predict the effects of new type of interventions: Signal routing, rather than variable fixing 34 (Gender) LEGAL IMPLICATIONS OF DIRECT EFFECT Can data prove an employer guilty of hiring discrimination? (Hiring) What is the direct effect of on? (Qualifications) E ( do( x1 ), do( ) do( x0), do( ) (averaged over Adjust for? No! No! 35 NATURAL INTERPRETATION OF AVERAGE DIRECT EFFECTS Robins and Greenland (1992) Pure = f ( y = g (, Natural Direct Effect of on : DE( x0, x1; ) The expected change in, when we change from x 0 to x 1 and, for each u, we keep constant at whatever value it attained before the change. E[ x ] 1 x x 0 0 In linear models, DE = Controlled Direct Effect = β ( x1 x0 ) 36 6
DEFINITION OF INDIRECT EFFECTS POLIC IMPLICATIONS OF INDIRECT EFFECTS = f ( y = g (, Indirect Effect of on : IE( x0, x1; ) The expected change in when we keep constant, say at x 0, and let change to whatever value it would have attained had changed to x 1. E[ x ] 0 x1 x0 In linear models, IE = TE - DE 37 What is the indirect effect of on? The effect of Gender on Hiring if sex discrimination is eliminated. GENDER IGNORE f HIRING QUALIFICATION Deactivating a link a new type of intervention 38 MEDIATION FORMULAS 1. The natural direct and indirect effects are identifiable in Markovian models (no confounding), 2. And are given by: DE = [ E( do( x1, ) do( x0, )] do( x0)). IE = E( do( x0, )[ do( x1 )) do( x0))] 3. Applicable to linear and non-linear models, continuous and discrete variables, regardless of distributional form. 39 MEDIATION FORMULAS IN UNCONFOUNDED MODELS DE = [ E( x1, x0, ] x0) IE = [ E( x0, [ x1) x0)] TE = E( x1) x0) TE DE = Fraction of responses owed to mediation IE = Fraction of responses explained by mediation 40 n 1 0 0 0 n 2 0 0 1 n 3 0 1 0 n 4 0 1 1 n 5 1 0 0 n 6 1 0 1 n 7 1 1 0 n 8 1 1 1 COMPUTING THE MEDIATION FORMULA E( =g x n2 = g00 n1 + n2 n4 = g01 n3 + n4 n6 = g10 n5 + n6 n8 = g11 n7 + n8 E( x)=h x n3 + n4 = h0 n1 + n2 + n3 + n4 n7 + n8 = h1 n5 + n6 + n7 + n8 DE = ( g10 g00)(1 h0 ) + ( g11 g01) h0 IE = ( h1 h0 )( g01 g00) 41 RAMIFICATION OF THE MEDIATION FORMULA DE should be averaged over mediator levels, IE should NOT be averaged over exposure levels. TE-DE need not equal IE TE-DE = proportion for whom mediation is necessary IE = proportion for whom mediation is sufficient TE-DE informs interventions on indirect pathways IE informs intervention on direct pathways. 42 7
MEASUREMENT BIAS AND EFFECT RESTORATION Unobserved w y do(x)) is identifiable from measurement of W, if w is given (Selen, 1986; Greenland W & Lash, 2008) w = w (local independence) w) =, w) = w = w P ( = I(, w) w) w 43 W EFFECT RESTORATION IN BINAR MODELS 1 To cell ( = 0) 1 undefined undefined w 1 W = 0 = 1) = ε δ To cell 1 ε ( = 1) 1 W = 1 = 0) = δ w 1 1 ( ) ( ( )) 1 ) P x P y do x 1 ( 1 ) δ P x w w1 w1 ) w + 0 ) 1 1 x) 1 0 ) ε x w w0 w0 ) Weight distribution from cell (W = 1) 44 EFFECT RESTORATION IN LINEAR MODELS c 1 c c 2 3 W c 0 (a) c 1 c c c 2 4 3 W V c 0 cov( ) cov( V ) cov( W ) cov( WV ) c0 = cov( V ) var( ) cov( W ) cov( WV ) (b) σ xy σ xwσ yw / k c k c 2 0 = = 3 σ σ xxσ 2 xw / k CONCLUSIONS I He TOLD is wise OU who CAUSALIT bases causal IS inference SIMPLE on an explicit causal structure that is Formal basis for causal and counterfactual defensible on scientific grounds. inference (complete) Unification of the graphical, potential-outcome and structural equation approaches Friendly and formal solutions to From Charlie Poole (Aristotle 384-322 B.C.) century-old problems and confusions. Correlated proxies (Cai & Kuroki, 2008) 45 46 QUESTIONS??? They will be answered 47 8