MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use matching? What if the assignment t the treatment is dne nt randmly, but n the basis f bservables? This is when matching methds cme in! Matching methds allw yu t cnstruct cmparisn grups when the assignment t the treatment is dne n the basis f bservable variables. 1

When can we use matching? Intuitin: the cmparisn grup needs t be as similar as pssible t the treatment grup, in terms f the bservables befre the start f the treatment. The methd assumes there are n remaining unbservable differences between treatment and cmparisn grups. Key Questin What is the effect f treatment n the treated when the assignment t the treatment is based n bservable variables? 2

Uncnfundedness & Selectin n bservables Let X dente a matrix in which each rw is a vectr f pre-treatment bservable variables fr individual i. Uncnfundedness: Assignment t treatment is uncnfunded given pre-treatment variables X if Y 1, Y 0 D X Uncnfundedness is equivalent t saying that: (1) within each cell defined by X: treatment is randm (2) the selectin int treatment depends nly n the bservables X. Average effects f treatment n the treated Assuming uncnfundedness given X Intuitin Estimate the treatment effect within each cell defined by X Take the average ver the different cells Math In yur handuts: Annex 1 3

Strategy fr estimating average effect f treatment n the treated Selectin n bservables Uncnfundedness suggests the fllwing strategy fr the estimatin f the average treatment effect δ Stratify the data int cells defined by each particular value f X Within each cell (i.e. cnditining n X) cmpute the difference between the average utcmes f the treated and the cntrls Average these differences with respect t the distributin f X in the ppulatin f treated units. Is this strategy feasible? Is ur strategy feasible? The Dimensinality Prblem This may nt be feasible when The sample is small The set f cvariates is large Many f the cvariates have many values r are cntinuus This is what we call The dimensinality prblem 4

The Dimensinality Prblem Examples Hw many cells d we have with 2 binary X variables? And with 3 binary X variables? And with K binary X variables? Hw abut if we have 2 variables that take n 7 values each? As the number f cells grws, we ll get lack f cmmn supprt cells cntaining nly treated bservatins cells cntaining nly cntrls An Alternative t slve the Dimensinality Prblem The prpensity scre allws t cnvert the multidimensinal setup f matching int a nedimensinal setup. In that way, it allws t reduce the dimensinality prblem. Rsenbaum and Rubin Rsenbaum and Rubin (1983) prpse an equivalent and feasible estimatin strategy based n the cncept f Prpensity Scre. 5

Matching based n the Prpensity Scre Definitin The prpensity scre is the cnditinal prbability f receiving the treatment given the pre-treatment variables: Lemma 1 Lemma 2 p(x) =Pr{D = 1 X} = EX{D X} If p(x) is the prpensity scre, then D X p(x) Given the prpensity scre, the pre-treatment variables are balanced between beneficiaries and nn- beneficiaries Y1, Y0 D X => Y 1, Y0 D p(x) Suppse that assignment t treatment is uncnfunded given the pre-treatment variables X. Then assignment t treatment is uncnfunded given the prpensity scre p(x). Des the prpensity scre apprach slve the dimensinality prblem? The balancing prperty f the prpensity scre (Lemma 1) ensures that: YES! Observatins with the same prpensity scre have the same distributin f bservable cvariates independently f treatment status; and fr a given prpensity scre, assignment t treatment is randm and therefre treatment and cntrl units are bservatinally identical n average. 6

Implementatin f the estimatin strategy This suggests the fllwing strategy fr the estimatin f the average treatment effect δ Step 1 Estimate a lgit (r prbit) mdel f prgram participatin. Predicted values are the prpensity scres. E.g. With a lgit functin, see Annex 3. This step is necessary because the true prpensity scre is unknwn and therefre the prpensity scre has t be estimated. When is prpensity scre matching apprpriate? Idea behind prpensity scre matching: estimatin f treatment effects requires a careful matching f treated and cntrls. If treated and cntrls are very different in terms f bservables this matching is nt sufficiently clse and reliable r it may even be impssible. The cmparisn f the estimated prpensity scres acrss treated and cntrls prvides a useful diagnstic tl t evaluate hw similar are treated and cntrls, and therefre hw reliable is the estimatin strategy. 7

S yu want prpensity scre t be the same fr treatments and cntrls The range f variatin f prpensity scres shuld be the same fr treated and cntrls. Cunt hw many cntrls have a prpensity scre lwer than the minimum r higher than the maximum f the prpensity scres f the treated and vice versa. Frequency f prpensity scres is the same fr treated and cntrl. Draw histgrams f the estimated prpensity scres fr the treated and cntrls. The bins crrespnd t the blcks cnstructed fr the estimatin f prpensity scres. The issue f cmmn supprt Density Density f scres fr nn-participants Density f scres fr participants 0 Regin f cmmn supprt 1 Prpensity scre 8

Density.2.4.6 0 Example: Cmmn supprt issues 0 1-5 -4-3 -2-1 0 1 2 3 4 5-5 -4-3 -2-1 0 1 2 3 4 5 Linear predictin Graphs by treated Figure A1: Prpensity Scres Fr EiC Phase 1 and nn-eic schls. Surce: Machin, McNally, Meghir, Excellence in Cities: Evaluatin f an educatin plicy in disadvantaged areas. Implementatin f the estimatin strategy Remember we re discussing a strategy fr the estimatin f the average treatment effect n the treated, called δ Step 1 Estimate the prpensity scre (see Annex 3) Step 2 Restrict the analysis t the regin f cmmn supprt (key surce f bias in bservatinal studies) 9

Step 3: Estimate the average treatment effect given the prpensity scre Fr each participant find a sample f nn-participants that have similar prpensity scres. Cmpare the utcme indicatr fr each participant and its cmparisn grup. Calculate the mean f these individual gains t btain the average verall gain. ATT P j1 ( Y NP j1 -WijYij0) / i1 P Step 3: Estimate the average treatment effect given the prpensity scre Similar can be defined in many ways. These different weights crrespnd t different ways f ding matching: Stratificatin n the Scre Nearest neighbr matching n the Scre Radius matching n the Scre Kernel matching n the Scre Weighting n the basis f the Scre 10

T summarize: Matching is the bservatinal analgue f an experiment in which placement is independent f utcmes The key difference is that a pure experiment des nt require the untestable assumptin f independence cnditinal n bservables. PSM requires gd data Often cmbined with difference-in-difference methds (cntrl fr selectin based n timeinvariant unbserved characteristics) References Dehejia, R.H. and S. Wahba (1999), Causal Effects in Nn-experimental Studies: Reevaluating the Evaluatin f Training Prgrams, Jurnal f the American Statistical Assciatin, 94, 448, 1053-1062. Dehejia, R.H. and S. Wahba (1996), Causal Effects in Nn-experimental Studies: Reevaluating the Evaluatin f Training Prgrams, Harvard University, Mime. Hahn, Jinyng (1998), On the rle f the prpensity scre in efficient semiparamentric estimatin f average treatment effects, Ecnmetrica, 66,2,315-331. Heckman, James J. H. Ichimura, and P. Tdd (1998), Matching as an ecnmetric evaluatin estimatr, Review f Ecnmic Studies, 65, 261-294. Hiran, K., G.W. Imbens and G. Ridder (2000), Efficient Estimatin f Average Treatment Effects using the Estimated Prpensity Scre, mime. Rsenbaum, P.R. and D.B. Rubin (1983), The Central Rle f the Prpensity Scre in Observatinal Studies fr Causal Effects, Bimetrika 70, 1, 41 55. Vinha, K. (2006) A primer n Prpensity Scre Matching Estimatrs Dcument CEDE 2006-13, Universidad de ls Andes 11

Thank Yu? Q & A 12

Annex 1: Average effects f treatment n the treated assuming uncnfundedness given X If we are willing t assume uncnfundedness: 0 0 0 1 1 1 E Y u D =0, X = E Y u D 1, X E Y u X i i i i i i i i E Y u D =0, X E Y u D 1, X E Y u X i i i i i i i i Using these expressins, we can define fr each cell defined by X =average treatment effect n the treated in cell defined by X X E { D 1, X} i i i Ei Y1 ui Y0ui Di 1, X Ei Y1 ui Di 1, X 0 i E i can measure sample analg E Y u D 1, X 1 Di i 0u can NOT measure sample analg Y u 1, X E Y D 0, X i i i can measure sample analg i Annex 1: Average effects f treatment n the treated assuming uncnfundedness given X Nw what is the relatin between "average treatment effect n the treated"... and... "average treatment effect n the treated within cell defined by X "? X average treatment effect n the treated E D 1 i i i by the law f iterated expectatins E E D =1, X E E i X i i E E D =1, X X i i i X X X {average treatment effect n the treated within cell defined by X} 13

Annex 2: Average effects f treatment and the prpensity scre S let's match treatments and cntrls n the basis f the prpensity scre p(x) instead f X. 0 =0, = 0 1, 0 1 =0, 1 1, 1 E Y u D p X E Y u D p X E Y u p X i i i i i i i i i i i E Y u D p X E Y u D p X E Y u p X i i i i i i i i i i i Using these expressins, we can define f cell defined by p X =average treatment effect n the treated in cell defined by p X Ei{ i Di 1, p X } Ei Y1 ui Y0ui Di 1, p X Ei Y1 ui Di 1, p X EY0 ui Di 1, p X p X can measure sample analg can NOT measure sample analg 1 Di 1, 0 Di 0, E Y u p X E Y u p X i i i i can measure sample analg Annex 2: Average effects f treatment and the prpensity scre Nw what is the relatin between p X p X "average treatment effect n the treated"... and... average treatment effect n the treated E D 1 i i i by the law f iterated expectatins Ei Ep X i Di p X E =1, i i i =1, p X p X p X E E D p X p X "average treatment effect n the treated within cell defined by "? E {treatment effect n the treated within cell defined by p X } 14

Annex 3: Estimatin f the prpensity scre Any standard prbability mdel can be used t estimate the prpensity scre, e.g. a lgit mdel: Pr h e X i} 1 e ( X i ) { Di h( X ) i (16) where h(xi) is a functin f cvariates with linear and higher rder terms. Estimatin f the prpensity scre Which higher rder terms d yu include in h(xi)? This is determined slely by the need t btain an estimate f the prpensity scre that satisfies the balancing prperty. The specificatin f h(xi) is (1) mre parsimnius than the full set f interactins between bservables X (2) thugh nt t parsimnius: it still needs t satisfy the balancing prperty. Nte: the estimatin f the prpensity scres des nt need a behaviral interpretatin. 15

An algrithm fr estimating the prpensity scre 1. Start with a parsimnius lgit r prbit functin t estimate the scre. 2. Srt the data accrding t the estimated prpensity scre (frm lwest t highest). 3. Stratify all bservatins in blcks such that in each blck the estimated prpensity scres fr the treated and the cntrls are nt statistically different: a) start with five blcks f equal scre range {0-0.2,..., 0.8-1} b) test whether the means f the scres fr the treated and the cntrls are statistically different in each blck c) if yes, increase the number f blcks and test again d) if n, g t next step. An algrithm fr estimating the prpensity scre (cntinued) 4. Test that the balancing prperty hlds in all blcks fr all cvariates: a) fr each cvariate, test whether the means (and pssibly higher rder mments) fr the treated and fr the cntrls are statistically different in all blcks; b) if ne cvariate is nt balanced in ne blck, split the blck and test again within each finer blck; c) if ne cvariate is nt balanced in all blcks, mdify the lgit estimatin f the prpensity scre adding mre interactin and higher rder terms and then test again. Nte: In all this prcedure the utcme has n rle. Use the STATA prgram pscre.ad, psmatch2.ad, match.ad (frm STATA type findit name ad ) 16