THE MATHEMATICS OF CAUSAL INFERENCE IN STATISTICS Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea) OUTLINE Modeling: Statistical vs. Causal Causal Models and Identifiability to three types of claims: 1. Effects of potential interventions 2. Claims about attribution (responsibilit 3. Claims about direct and indirect effects TRADITIONAL STATISTICAL INFERENCE PARADIGM P Q(P) (Aspects of P) e.g., Infer whether customers who bought product A would also buy product B. Q = B A) FROM STATISTICAL TO CAUSAL ANALSIS: 1. THE DIFFERENCES Probability and statistics deal with static relations P change P Q(P ) (Aspects of P ) What happens when P changes? e.g., Infer whether customers who bought product A would still buy A if we were to double the price. FROM STATISTICAL TO CAUSAL ANALSIS: 1. THE DIFFERENCES What remains invariant when P changes say, to satisfy P (price=2)=1 P change P Note: P (v) P (v price = 2) P does not tell us how it ought to change e.g. Curing symptoms vs. curing diseases e.g. Analogy: mechanical deformation Q(P ) (Aspects of P ) FROM STATISTICAL TO CAUSAL ANALSIS: 1. THE DIFFERENCES (CONT) 1. Causal and statistical concepts do not mix. CAUSAL STATISTICAL Spurious correlation Regression Randomization Association / Independence Confounding / Effect Controlling for / Conditioning Instrument Odd and risk ratios Holding constant Collapsibility Explanatory variables Propensity score 2. 3. 4. 1
FROM STATISTICAL TO CAUSAL ANALSIS: 1. THE DIFFERENCES (CONT) 1. Causal and statistical concepts do not mix. CAUSAL STATISTICAL Spurious correlation Regression Randomization Association / Independence Confounding / Effect Controlling for / Conditioning Instrument Odd and risk ratios Holding constant Collapsibility Explanatory variables Propensity score 2. No causes in no causes out (Cartwright, 1989) statistical assumptions + data causal assumptions causal conclusions 3. Causal assumptions cannot be expressed in the mathematical language of standard statistics. 4. } FROM STATISTICAL TO CAUSAL ANALSIS: 1. THE DIFFERENCES (CONT) 1. Causal and statistical concepts do not mix. CAUSAL STATISTICAL Spurious correlation Regression Randomization Association / Independence Confounding / Effect Controlling for / Conditioning Instrument Odd and risk ratios Holding constant Collapsibility Explanatory variables Propensity score 2. No causes in no causes out (Cartwright, 1989) statistical assumptions + data causal assumptions causal conclusions 3. Causal assumptions cannot be expressed in the mathematical language of standard statistics. 4. Non-standard mathematics: a) Structural equation models (Wright, 1920; Simon, 1960) b) Counterfactuals (Neyman-Rubin ( x ), Lewis (x )) } WH CAUSALIT NEEDS SPECIAL MATHEMATICS FROM STATISTICAL TO CAUSAL ANALSIS: 2. THE MENTAL BARRIERS SEM Equations are Non-algebraic: = 2 = 1 Process information Had been 3, would be 6. If we raise to 3, would be 6. Must wipe out = 1. = 1 = 2 Static information 1. Every exercise of causal analysis must rest on untested, judgmental causal assumptions. 2. Every exercise of causal analysis must invoke non-standard mathematical notation. TWO PARADIGMS FOR CAUSAL INFERENCE Observed:,,,...) Conclusions needed: x =, y =x =z)... How do we connect observables,,,, to counterfactuals x, z, y,? THE STRUCTURAL MODEL PARADIGM Generating Model Q(M) (Aspects of M) N-R model Counterfactuals are primitives, new variables Super-distribution P*(,,, x, z, ),, constrain x, y, Structural model Counterfactuals are derived quantities Subscripts modify a data-generating model M Oracle for computing answers to Q s. e.g., Infer whether customers who bought product A would still buy A if we were to double the price. 2
FAMILIAR CAUSAL MODEL ORACLE FOR MANIPILATION STRUCTURAL CAUSAL MODELS INPUT OUTPUT Definition: A structural causal model is a 4-tuple V,U, F,, where V = {V 1,...,V n } are observable variables U ={U 1,...,U m } are background variables F = {f 1,..., f n } are functions determining V, v i = f i (v, is a distribution over U and F induce a distribution v) over observable variables CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: would be y (in situation, had been x, denoted x ( = y, means: The solution for in a mutilated model M x, (i.e., the equations for replaced by = x) with input U=u, is equal to y. probabilities of counterfactuals: x = y, w = z) = u: x( = y, w( = z The super-distribution P* is derived from M. Parsimonious, consistent, and transparent APPLICATIONS 1. Predicting effects of actions and policies 2. Learning causal relationships from assumptions and data 3. Troubleshooting physical systems and plans 4. Finding explanations for reported events 5. Generating verbal explanations 6. Understanding causal talk 7. Formulating theories of causal thinking AIOMS OF CAUSAL COUNTERFACTUALS x ( = y : would be y, had been x (in state U = 1. Definiteness x s. t. y ( = x 2. Uniqueness ( y ( = x) & ( y( = x') x = x' 3. Effectiveness xw ( = x 4. Composition W x ( = w xw( = x( 5. Reversibility ( xw ( = y & ( Wxy( = w) x( = y RULES OF CAUSAL CALCULUS Rule 1: Ignoring observations y do{x}, z, w) = y do{x}, w) if (,W ) G Rule 2: Action/observation exchange y do{x}, do{z}, w) = y do{x},z,w) if (,W ) Rule 3: Ignoring actions y do{x}, do{z}, w) = y do{x}, w) G if (,W ) G (W) 3
DERIVATION IN CAUSAL CALCULUS Genotype (Unobserved) Smoking Tar Cancer P (c do{s}) = Σ t P (c do{s}, t) P (t do{s}) Probability Axioms = Σ t P (c do{s}, do{t}) P (t do{s}) = Σ t P (c do{s}, do{t}) P (t s) = Σ t P (c do{t}) P (t s) Rule 2 Rule 2 Rule 3 = Σ s Σ t P (c do{t}, s ) P (s do{t}) t s) Probability Axioms = Σ s Σ t P (c t, s ) P (s do{t}) t s) Rule 2 = Σ s Σ t P (c t, s ) P (s ) t s) Rule 3 THE BACK-DOOR CRITERION Graphical test of identification y do(x)) is identifiable in G if there is a set of variables such that d-separates from in G x. 3 1 4 6 G Moreover, y do(x)) = y x,z) z) z ( adjusting for ) 2 5 3 1 4 6 G x 2 5 RECENT RESULTS ON IDENTIFICATION do-calculus is complete DETERMINING THE CAUSES OF EFFECTS (The Attribution Problem) our Honor! My client (Mr. A) died BECAUSE he used that drug. Complete graphical criterion for identifying causal effects (Shpitser and Pearl, 2006). Complete graphical criterion for empirical testability of counterfactuals (Shpitser and Pearl, 2007). DETERMINING THE CAUSES OF EFFECTS (The Attribution Problem) our Honor! My client (Mr. A) died BECAUSE he used that drug. Semantical Problem: THE PROBLEM 1. What is the meaning of PN(x,: Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur. Court to decide if it is MORE PROBABLE THAN NOT that A would be alive BUT FOR the drug! PN =? A is dead, took the drug) > 0.50 4
Semantical Problem: THE PROBLEM 1. What is the meaning of PN(x,: Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur. Answer: Computable from M PN( x, = x ' = y' x, Semantical Problem: THE PROBLEM 1. What is the meaning of PN(x,: Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur. Analytical Problem: 2. Under what condition can PN(x, be learned from statistical data, i.e., observational, experimental and combined. TPICAL THEOREMS (Tian and Pearl, 2000) Bounds given combined nonexperimental and experimental data 0 1 y ) ( ) max x' P y' PN min x' x, x, Identifiability under monotonicity (Combined data) y x) y x' ) y x' ) y ) PN = + x' y x) x, corrected Excess-Risk-Ratio CAN FREQUENC DATA DECIDE LEGAL RESPONSIBILIT? Experimental Nonexperimental do(x) do(x ) x x Deaths ( 16 14 2 28 Survivals (y ) 984 986 998 972 1,000 1,000 1,000 1,000 Nonexperimental data: drug usage predicts longer life Experimental data: drug has negligible effect on survival Plaintiff: Mr. A is special. 1. He actually died 2. He used the drug by choice Court to decide (given both data): Is it more probable than not that A would be alive but for the drug? PN Δ = x' = y' x, > 0.50 SOLUTION TO THE ATTRIBUTION PROBLEM EFFECT DECOMPOSITION What is the semantics of direct and indirect effects? What are their policy-making implications? WITH PROBABILIT ONE 1 y x x, 1 Combined data tell more that each study alone Can we estimate them from data? Experimental data? 5
WH DECOMPOSE EFFECTS? 1. Direct (or indirect) effect may be more transportable. 2. Indirect effects may be prevented or controlled. Pill Pregnancy + + Thrombosis 3. Direct (or indirect) effect may be forbidden Gender Hiring Qualification SEMANTICS BECOMES NONTRIVIAL IN NONLINEAR MODELS (even when the model is completely specified) TE Δ = E( do( x)) x DE Δ = E( do( x), do( z)) x IE Δ =???? z = f (x, ε 1 ) y = g (x, z, ε 2 ) Dependent on z? Void of operational meaning? THE OPERATIONAL MEANING OF DIRECT EFFECTS THE OPERATIONAL MEANING OF INDIRECT EFFECTS z = f (x, ε 1 ) y = g (x, z, ε 2 ) z = f (x, ε 1 ) y = g (x, z, ε 2 ) Natural Direct Effect of on : The expected change in per unit change of, when we keep constant at whatever value it attains before the change. E[ x ] 1 x x 0 0 In linear models, NDE = Controlled Direct Effect Natural Indirect Effect of on : The expected change in when we keep constant, say at x 0, and let change to whatever value it would have under a unit change in. E[ x ] 0 x1 x0 In linear models, NIE = TE - DE POLIC IMPLICATIONS OF INDIRECT EFFECTS indirect What is the direct effect of on? The effect of Gender on Hiring if sex discrimination is eliminated. GENDER IGNORE f HIRING QUALIFICATION SEMANTICS AND IDENTIFICATION OF NESTED COUNTERFACTUALS Consider the quantity Q Δ = Eu [ x ( ] x*( Given M,, Q is well defined Given u, x* ( is the solution for in M x*, call it z x x ( ) is the solution for in M * ( u xz experimental Can Q be estimated from data? nonexperimental 6
W GENERAL PATH-SPECIFIC EFFECTS (Def.) z * = x* ( Form a new model, M g *, specific to active subgraph g fi * ( pai, u; g) = fi( pai( g), pai*( g ), u ) Definition: g-specific effect Nonidentifiable even in Markovian models W x * E g ( x, x* ; ) M = TE( x, x* ; ) M * g EFFECT DECOMPOSITION SUMMAR Graphical conditions for estimability from experimental / nonexperimental data. Graphical conditions hold in Markovian models Useful in answering new type of policy questions involving mechanism blocking instead of variable fixing. CONCLUSIONS Structural-model semantics, enriched with logic and graphs, provides: Complete formal basis for causal reasoning Powerful and friendly causal calculus Lays the foundations for asking more difficult questions: What is an action? What is free will? Should robots be programmed to have this illusion? 7