CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal in nature Cause-effect relatinships are a part f what plicy makers d: Des schl decentralizatin imprve schl quality? Des ne mre year f educatin cause higher incme? Des cnditinal cash transfers cause better health utcmes in children? Hw d we imprve student learning? 1

Standard Statistical Analysis Tls Likelihd and ther estimatin techniques. Aim T infer parameters f a distributin frm samples drawn f that distributin. Uses With the help f such parameters, ne can: Infer assciatin amng variables, Estimate the likelihd f past and future events, Update the likelihd f events in light f new evidence r new measurement. Standard Statistical Analysis Cnditin Fr this t wrk well, experimental cnditins must remain the same. But ur plicy questins were: If I decentralize schls, will quality imprve? If I find a way t make a child g t schl lnger, will she earn mre mney? If I start giving cash t families, will their children becme healthier? If I train teachers, will students learn mre? 2

Causal Analysis Fr causal questins, we need t infer aspects f the data generatin prcess. We need t be able t deduce: 1. the likelihd f events under static cnditins, (as in Standard Statistical Analysis) and als 2. the dynamics f events under changing cnditins. Causal Analysis dynamics f events under changing cnditins includes: 1. Predicting the effects f interventins. 2. Predicting the effects f spntaneus changes. 3. Identifying causes f reprted events. 3

Causatin vs. Crrelatin Standard statistical analysis/prbability thery: The wrd cause is nt in its vcabulary. Allws us t say is that tw events are mutually crrelated, r dependent. This is nt enugh fr plicy makers They lk at ratinales fr plicy decisins: if we d XXX, then will we get YYY? We need a vcabulary fr causality. THE RUBIN CAUSAL MODEL Vcabulary fr Causality 4

Ppulatin & Outcme Variable Define the ppulatin by U. Each unit in U is dented by u. The utcme f interest is Y. Als called the respnse variable. Fr each u U, there is an assciated value Y(u). Causes/Treatment experiments. Causes are thse things that culd be treatments in hypthetical Fr simplicity, we assume that there are just tw pssible states: Unit u is expsed t treatment and Unit u is expsed t cmparisn. - Rubin 5

The Treatment Variable Let D be a variable that indicates the state t which each unit in U is expsed. D = 1 If unit u is expsed t treatment 0 If unit u is expsed t cmparisn Where des D cme frm? In a cntrlled study: cnstructed by the experimenter. In an uncntrlled study: determined by factrs beynd the experimenter s cntrl. Linking Y and D Y=respnse variable D= treatment variable The respnse Y is ptentially affected by whether u receives treatment r nt. Thus, we need tw respnse variables: Y 1 (u) is the utcme if unit u is expsed t treatment. Y 0 (u) is the utcme if unit u is expsed t cmparisn. 6

The effect f treatment n utcme Treatment variable D D = 1 If unit u is expsed t treatment 0 If unit u is expsed t cmparisn Respnse variable Y Y 1 (u) is the utcme if unit u is expsed t treatment Y 0 (u) is the utcme if unit u is expsed t cmparisn Fr any unit u, treatment causes the effect δ u = Y 1 (u) - Y 0 (u) But there is a prblem: Fr any unit u, treatment causes the effect δ u = Y 1 (u) - Y 0 (u) Fundamental prblem f causal inference Fr a given unit u, we bserve either Y 1 (u) r Y 0 (u), it is impssible t bserve the effect f treatment n u by itself! We d nt bserve the cunterfactual If we give u treatment, then we cannt bserve what wuld have happened t u in the absence f treatment. 7

S what d we d? Instead f measuring the treatment effect n unit u, we identify the average treatment effect fr the ppulatin U (r fr sub-ppulatins) Y ( u) Y ( u) u ATE E [ Y ( u) Y ( u)] U Y U U Y E [ Y ( u)] E [ Y ( u)] 1 U 0 ( 1) Estimating the ATE S, (1) Replace the impssible-t-bserve treatment effect f D n a specific unit u, (2) with the pssible-t-estimate average treatment effect f D ver a ppulatin U f such units. Althugh E U (Y 1 ) and E U (Y 0 ) cannt bth be calculated, they can be estimated. Mst ecnmetrics methds attempt t cnstruct frm bservatinal data cnsistent estimatrs f: E U (Y 1 ) = Y 1 and E U (Y 0 )= Y 0 8

A simple estimatr f ATE U S we are trying t estimate: ATE U = E U (Y 1 ) - E U (Y 0 ) = Y 1 - Y 0 Cnsider the fllwing simple estimatr: δ = [ Y 1 D = 1] - [ Y 0 D =0 ] (1) (2) Nte Equatin (1) is defined fr the whle ppulatin. Equatin (2) is an estimatr t be cmputed n a sample drawn frm that ppulatin. An imprtant lemma Lemma: If we assume that [ Y D 1] [ Y D 0] 1 1 and [ Y D 1] [ Y D 0] 0 0 then ˆ [ Y ˆ D 1] -[ Y ˆ D 0] is a cnsistent estimatr f Y -Y 9

Fundamental Cnditins Thus, a sufficient cnditin fr the simple estimatr t cnsistently estimate the true ATE is that: And [Y 1 D=1]=[Y 1 D=0] The average utcme under treatment Y 1 is the same fr the treatment (D=1) and the cmparisn (D=0) grups [Y 0 D=1]=[Y 0 D=0] The average utcme under cmparisn Y 0 is the same fr the treatment (D=1) and the cmparisn (D=0) grups When will thse cnditins be satisfied? It is sufficient that treatment assignment D be uncrrelated with the ptential utcme distributins f Y 0 and Y 1 Intuitively, there can be n crrelatin between (1) Whether smene gets the treatment and (2) Hw much that persn ptentially benefits frm the treatment. The easiest way t achieve this uncrrelatedness is thrugh randm assignment f treatment. 10

Anther way f lking at it With sme algebra, it can be shwn that: ˆ simple true estimatr impact [ Y D 1] [ Y D 0] 0 0 (1 ) Baseline Difference { D 1} { D 0} Hetergeneus Respnse t Treatment Anther way f lking at it (in wrds) There are tw surces f bias that need t be eliminated frm estimates f causal effects : Baseline difference/selectin bias Hetergeneus respnse t the treatment Mst f the methds available nly deal with selectin bias. 11

Treatment n the Treated treated that is f substantive interest: Average Treatment Effect is nt always the parameter f interest ften, it is the average treatment effect fr the TOT E [ Y ( u) Y ( u) D 1] E [ Y ( u) D 1] E [ Y ( u) D 1] Treatment n the Treated If we need t estimate Treatment n the Treated: TOT E [ Y ( u) D 1] E [ Y ( u) D 1] Then the simple estimatr ˆ [ Y ˆ D 1]-[ Y ˆ D 0] [ Y D 1] [ Y D 0] 0 0 (2) cnsistently estimates Treatment n the Treated if: N baseline difference between the treatment and cmparisn grups. 12

References Judea Pearl (2000): Causality: Mdels, Reasning and Inference, Cambridge University press. (Bk) Chapters 1, 5 and 7. Trygve Haavelm (1944): The prbability apprach in ecnmetrics, Ecnmetrica 12, pp. iii-vi+1-115. Arthur Gldberger (1972): Structural Equatins Methds in the Scial Sciences, Ecnmetrica 40, pp. 979-1002. Dnald B. Rubin (1974): Estimating causal effects f treatments in randmized and nnrandmized experiments, Jurnal f Educatinal Psychlgy 66, pp. 688-701. Paul W. Hlland (1986): Statistics and Causal Inference, Jurnal f the American Statistical Assciatin 81, pp. 945-70, with discussin. Thank Yu 13

? Q & A 14