MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use matching? What if the assignment t the treatment is dne nt randmly, but n the basis f bservables? This is when matching methds cme in! Matching methds allw yu t cnstruct cmparisn grups when the assignment t the treatment is dne n the basis f bservable variables. 1
When can we use matching? Intuitin: the cmparisn grup needs t be as similar as pssible t the treatment grup, in terms f the bservables befre the start f the treatment. The methd assumes there are n remaining unbservable differences between treatment and cmparisn grups. Key Questin What is the effect f treatment n the treated when the assignment t the treatment is based n bservable variables? 2
Uncnfundedness & Selectin n bservables Let X dente a matrix in which each rw is a vectr f pre-treatment bservable variables fr individual i. Uncnfundedness: Assignment t treatment is uncnfunded given pre-treatment variables X if Y 1, Y 0 D X Uncnfundedness is equivalent t saying that: (1) within each cell defined by X: treatment is randm (2) the selectin int treatment depends nly n the bservables X. Average effects f treatment n the treated Assuming uncnfundedness given X Intuitin Estimate the treatment effect within each cell defined by X Take the average ver the different cells Math In yur handuts: Annex 1 3
Strategy fr estimating average effect f treatment n the treated Selectin n bservables Uncnfundedness suggests the fllwing strategy fr the estimatin f the average treatment effect δ Stratify the data int cells defined by each particular value f X Within each cell (i.e. cnditining n X) cmpute the difference between the average utcmes f the treated and the cntrls Average these differences with respect t the distributin f X in the ppulatin f treated units. Is this strategy feasible? Is ur strategy feasible? The Dimensinality Prblem This may nt be feasible when The sample is small The set f cvariates is large Many f the cvariates have many values r are cntinuus This is what we call The dimensinality prblem 4
The Dimensinality Prblem Examples Hw many cells d we have with 2 binary X variables? And with 3 binary X variables? And with K binary X variables? Hw abut if we have 2 variables that take n 7 values each? As the number f cells grws, we ll get lack f cmmn supprt cells cntaining nly treated bservatins cells cntaining nly cntrls An Alternative t slve the Dimensinality Prblem The prpensity scre allws t cnvert the multidimensinal setup f matching int a nedimensinal setup. In that way, it allws t reduce the dimensinality prblem. Rsenbaum and Rubin Rsenbaum and Rubin (1983) prpse an equivalent and feasible estimatin strategy based n the cncept f Prpensity Scre. 5
Matching based n the Prpensity Scre Definitin The prpensity scre is the cnditinal prbability f receiving the treatment given the pre-treatment variables: Lemma 1 Lemma 2 p(x) =Pr{D = 1 X} = EX{D X} If p(x) is the prpensity scre, then D X p(x) Given the prpensity scre, the pre-treatment variables are balanced between beneficiaries and nn- beneficiaries Y1, Y0 D X => Y 1, Y0 D p(x) Suppse that assignment t treatment is uncnfunded given the pre-treatment variables X. Then assignment t treatment is uncnfunded given the prpensity scre p(x). Des the prpensity scre apprach slve the dimensinality prblem? The balancing prperty f the prpensity scre (Lemma 1) ensures that: YES! Observatins with the same prpensity scre have the same distributin f bservable cvariates independently f treatment status; and fr a given prpensity scre, assignment t treatment is randm and therefre treatment and cntrl units are bservatinally identical n average. 6
Implementatin f the estimatin strategy This suggests the fllwing strategy fr the estimatin f the average treatment effect δ Step 1 Estimate a lgit (r prbit) mdel f prgram participatin. Predicted values are the prpensity scres. E.g. With a lgit functin, see Annex 3. This step is necessary because the true prpensity scre is unknwn and therefre the prpensity scre has t be estimated. When is prpensity scre matching apprpriate? Idea behind prpensity scre matching: estimatin f treatment effects requires a careful matching f treated and cntrls. If treated and cntrls are very different in terms f bservables this matching is nt sufficiently clse and reliable r it may even be impssible. The cmparisn f the estimated prpensity scres acrss treated and cntrls prvides a useful diagnstic tl t evaluate hw similar are treated and cntrls, and therefre hw reliable is the estimatin strategy. 7
S yu want prpensity scre t be the same fr treatments and cntrls The range f variatin f prpensity scres shuld be the same fr treated and cntrls. Cunt hw many cntrls have a prpensity scre lwer than the minimum r higher than the maximum f the prpensity scres f the treated and vice versa. Frequency f prpensity scres is the same fr treated and cntrl. Draw histgrams f the estimated prpensity scres fr the treated and cntrls. The bins crrespnd t the blcks cnstructed fr the estimatin f prpensity scres. The issue f cmmn supprt Density Density f scres fr nn-participants Density f scres fr participants 0 Regin f cmmn supprt 1 Prpensity scre 8
Density.2.4.6 0 Example: Cmmn supprt issues 0 1-5 -4-3 -2-1 0 1 2 3 4 5-5 -4-3 -2-1 0 1 2 3 4 5 Linear predictin Graphs by treated Figure A1: Prpensity Scres Fr EiC Phase 1 and nn-eic schls. Surce: Machin, McNally, Meghir, Excellence in Cities: Evaluatin f an educatin plicy in disadvantaged areas. Implementatin f the estimatin strategy Remember we re discussing a strategy fr the estimatin f the average treatment effect n the treated, called δ Step 1 Estimate the prpensity scre (see Annex 3) Step 2 Restrict the analysis t the regin f cmmn supprt (key surce f bias in bservatinal studies) 9
Step 3: Estimate the average treatment effect given the prpensity scre Fr each participant find a sample f nn-participants that have similar prpensity scres. Cmpare the utcme indicatr fr each participant and its cmparisn grup. Calculate the mean f these individual gains t btain the average verall gain. ATT P j1 ( Y NP j1 -WijYij0) / i1 P Step 3: Estimate the average treatment effect given the prpensity scre Similar can be defined in many ways. These different weights crrespnd t different ways f ding matching: Stratificatin n the Scre Nearest neighbr matching n the Scre Radius matching n the Scre Kernel matching n the Scre Weighting n the basis f the Scre 10
T summarize: Matching is the bservatinal analgue f an experiment in which placement is independent f utcmes The key difference is that a pure experiment des nt require the untestable assumptin f independence cnditinal n bservables. PSM requires gd data Often cmbined with difference-in-difference methds (cntrl fr selectin based n timeinvariant unbserved characteristics) References Dehejia, R.H. and S. Wahba (1999), Causal Effects in Nn-experimental Studies: Reevaluating the Evaluatin f Training Prgrams, Jurnal f the American Statistical Assciatin, 94, 448, 1053-1062. Dehejia, R.H. and S. Wahba (1996), Causal Effects in Nn-experimental Studies: Reevaluating the Evaluatin f Training Prgrams, Harvard University, Mime. Hahn, Jinyng (1998), On the rle f the prpensity scre in efficient semiparamentric estimatin f average treatment effects, Ecnmetrica, 66,2,315-331. Heckman, James J. H. Ichimura, and P. Tdd (1998), Matching as an ecnmetric evaluatin estimatr, Review f Ecnmic Studies, 65, 261-294. Hiran, K., G.W. Imbens and G. Ridder (2000), Efficient Estimatin f Average Treatment Effects using the Estimated Prpensity Scre, mime. Rsenbaum, P.R. and D.B. Rubin (1983), The Central Rle f the Prpensity Scre in Observatinal Studies fr Causal Effects, Bimetrika 70, 1, 41 55. Vinha, K. (2006) A primer n Prpensity Scre Matching Estimatrs Dcument CEDE 2006-13, Universidad de ls Andes 11
Thank Yu? Q & A 12
Annex 1: Average effects f treatment n the treated assuming uncnfundedness given X If we are willing t assume uncnfundedness: 0 0 0 1 1 1 E Y u D =0, X = E Y u D 1, X E Y u X i i i i i i i i E Y u D =0, X E Y u D 1, X E Y u X i i i i i i i i Using these expressins, we can define fr each cell defined by X =average treatment effect n the treated in cell defined by X X E { D 1, X} i i i Ei Y1 ui Y0ui Di 1, X Ei Y1 ui Di 1, X 0 i E i can measure sample analg E Y u D 1, X 1 Di i 0u can NOT measure sample analg Y u 1, X E Y D 0, X i i i can measure sample analg i Annex 1: Average effects f treatment n the treated assuming uncnfundedness given X Nw what is the relatin between "average treatment effect n the treated"... and... "average treatment effect n the treated within cell defined by X "? X average treatment effect n the treated E D 1 i i i by the law f iterated expectatins E E D =1, X E E i X i i E E D =1, X X i i i X X X {average treatment effect n the treated within cell defined by X} 13
Annex 2: Average effects f treatment and the prpensity scre S let's match treatments and cntrls n the basis f the prpensity scre p(x) instead f X. 0 =0, = 0 1, 0 1 =0, 1 1, 1 E Y u D p X E Y u D p X E Y u p X i i i i i i i i i i i E Y u D p X E Y u D p X E Y u p X i i i i i i i i i i i Using these expressins, we can define f cell defined by p X =average treatment effect n the treated in cell defined by p X Ei{ i Di 1, p X } Ei Y1 ui Y0ui Di 1, p X Ei Y1 ui Di 1, p X EY0 ui Di 1, p X p X can measure sample analg can NOT measure sample analg 1 Di 1, 0 Di 0, E Y u p X E Y u p X i i i i can measure sample analg Annex 2: Average effects f treatment and the prpensity scre Nw what is the relatin between p X p X "average treatment effect n the treated"... and... average treatment effect n the treated E D 1 i i i by the law f iterated expectatins Ei Ep X i Di p X E =1, i i i =1, p X p X p X E E D p X p X "average treatment effect n the treated within cell defined by "? E {treatment effect n the treated within cell defined by p X } 14
Annex 3: Estimatin f the prpensity scre Any standard prbability mdel can be used t estimate the prpensity scre, e.g. a lgit mdel: Pr h e X i} 1 e ( X i ) { Di h( X ) i (16) where h(xi) is a functin f cvariates with linear and higher rder terms. Estimatin f the prpensity scre Which higher rder terms d yu include in h(xi)? This is determined slely by the need t btain an estimate f the prpensity scre that satisfies the balancing prperty. The specificatin f h(xi) is (1) mre parsimnius than the full set f interactins between bservables X (2) thugh nt t parsimnius: it still needs t satisfy the balancing prperty. Nte: the estimatin f the prpensity scres des nt need a behaviral interpretatin. 15
An algrithm fr estimating the prpensity scre 1. Start with a parsimnius lgit r prbit functin t estimate the scre. 2. Srt the data accrding t the estimated prpensity scre (frm lwest t highest). 3. Stratify all bservatins in blcks such that in each blck the estimated prpensity scres fr the treated and the cntrls are nt statistically different: a) start with five blcks f equal scre range {0-0.2,..., 0.8-1} b) test whether the means f the scres fr the treated and the cntrls are statistically different in each blck c) if yes, increase the number f blcks and test again d) if n, g t next step. An algrithm fr estimating the prpensity scre (cntinued) 4. Test that the balancing prperty hlds in all blcks fr all cvariates: a) fr each cvariate, test whether the means (and pssibly higher rder mments) fr the treated and fr the cntrls are statistically different in all blcks; b) if ne cvariate is nt balanced in ne blck, split the blck and test again within each finer blck; c) if ne cvariate is nt balanced in all blcks, mdify the lgit estimatin f the prpensity scre adding mre interactin and higher rder terms and then test again. Nte: In all this prcedure the utcme has n rle. Use the STATA prgram pscre.ad, psmatch2.ad, match.ad (frm STATA type findit name ad ) 16