Using Mixed Integer Programming for Matching in Observational Studies

Size: px

Start display at page:

Download "Using Mixed Integer Programming for Matching in Observational Studies"

Jeffry Hardy
5 years ago
Views:

1 Using Mixed Integer Programming for Matching in Observational Studies José R. Zubizarreta Department Statistics The Wharton School University of Pennsylvania

2 Key takeaway points Optimal matching method Get the balance you want Know it is infeasible Eliminate guesswork Directly balance several statistics beyond means Keep the adjustments simple enough People can talk about them Sensitivity analysis to unobserved biases José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 1 / 17

3 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 1 / 17

4 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 1 / 17

5 The 2010 Chilean earthquake 4th strongest earthquake in the world in the last 50 years Sebastián Martínez/AP Photo José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 2 / 17

6 Effect of the earthquake Effect of the earthquake on posttraumatic stress? The post earthquake survey (EPT) 7/89"! 9).<=>2)?&! 9:;!!"#$!!!!!!!!!!%&'!!!!!!!!!!!!()*!!!!!!!!!!!!!+&,!!!!!!!!!!-).!!!!!!!!!!!!/0.!!!!!!!!!!!-)1!!!!!!!!!!!(2*!!!!!!!!!!!!!3445!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!3464! Re-interviewed 22,456 households from CASEN 2009 Detailed measurements of the same individuals before and after José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 3 / 17

7 Intensity of the earthquake Peak ground acceleration (PGA) in the communes of the EPT Jose R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 4 / 17

8 Matched design Matched respondents with PGA < to those with PGA We matched exactly for sex, age and ethnic groups with fine balance for self-rated health, quality of the housing balancing the entire empirical distributions of income mean balancing the 46 covariates in the study José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 5 / 17

9 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 5 / 17

10 Notation Let T = {t 1,..., t T } be the set of treated units, and C = {c 1,..., c C }, the set of potential controls, with T C Define P = {p 1,..., p P } as the set of observed covariates Each treated unit t T has a vector of observed covariates x t, = {x t,p1,..., x t,pp }, and each control c C has a similar vector x c, = {x c,p1,..., x c,pp } Based on these covariates there is a distance 0 δ t,c < between treated and control units Decision variable a t,c = { 1 if treated t is assigned to control c 0 otherwise José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 6 / 17

11 The assignment algorithm minimize a subject to δ t,c a t,c t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 7 / 17

12 A MIP with direct balance via the objective function minimize a subject to δ t,c a t,c + ω i µ i (a) j J t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 7 / 17

13 A MIP with direct balance via the constraints minimize a subject to δ t,c a t,c t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C ν j (a) ε j, j J José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 7 / 17

14 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 7 / 17

15 Balancing the means of the covariates (1) minimize a subject to δ t,c a t,c + x c,j a t,c ω j mt x T,j t T c C j J t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 8 / 17

16 Balancing the means of the covariates (1) minimize a,z subject to δ t,c a t,c + ω j z j j J t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C z j x c,j a t,c mt x T,j, j J t T c C z j x c,j a t,c mt + x T,j, j J t T c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 8 / 17

17 Balancing the means of the covariates (1) minimize a,z subject to δ t,c a t,c t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C x c,j a t,c mt x T,j ε j, j J t T c C x c,j a t,c mt + x T,j ε j, j J t T c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 8 / 17

18 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17

19 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching After matching, assignment algorithm Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17

20 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching After matching, assignment algorithm After matching, mipmatch Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17

21 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching After matching, assignment algorithm After matching, mipmatch Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17

22 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching After matching, assignment algorithm After matching, mipmatch Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17

23 Balancing Kolmogorov-Smirnov statistics (1) ω j µ j (a) = ω j sup FT (x c,p ) F C (x c,p, a) x c,p G(x T,p ) j J = ω j z j 1 G(x T,p ) t T c C 1 {xg 1;p x c,p<x g;p}a t,c mt x g;p G(x T,p ) José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 10 / 17

24 Balancing Kolmogorov-Smirnov statistics (2) ECDF(x) Controls Exposed, before matching Exposed, after matching, assignment algorithm Exposed, after matching, mipmatch x = Household per capita income (thousand pesos) José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 11 / 17

25 Balancing Kolmogorov-Smirnov statistics (2) ECDF(x) Controls Exposed, before matching Exposed, after matching, assignment algorithm Exposed, after matching, mipmatch x = Household per capita income (thousand pesos) José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 11 / 17

26 Balancing Kolmogorov-Smirnov statistics (2) ECDF(x) Controls Exposed, before matching Exposed, after matching, assignment algorithm Exposed, after matching, mipmatch x = Household per capita income (thousand pesos) José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 11 / 17

27 Fine and near-fine balance for several covariates (1) Fine balance: a t,c 1 {xc,p=b} = m 1 {xt,p=b} b B t T c C t T Near-fine balance: a t,c 1 {xc,p=b} m 1 {xt,p=b} ξ b B t T c C t T José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 12 / 17

28 Fine and near-fine balance for several covariates (2) Table: Fine balance for self-rated health Exposed Controls Poor Good Fair Table: Fine balance for material quality of the housing Exposed Controls Acceptable Unacceptable Beyond repair José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 13 / 17

29 Density plot of PTS scores Estimated Density of Pair Differences Density Density estimate x = Exposed-minus-control pair differences in PTS scores Boxplot of Pair Differences Exposed-minus-control pair differences in PTS scores José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 14 / 17

30 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 14 / 17

31 Summary and remarks Explicitly optimize or constrain the criteria used to assess the quality of the match Meet the criteria Know that the criteria is infeasible Directly balance Means Variances and skewness Correlations Quantiles Kolmogorov-Smirnov statistic While matching with exact, near-exact, fine and near-fine balance for more than one covariate A systematic method for improving covariate balance José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 15 / 17

32 Extensions Optimal subset matching Building a stronger instrumental variable Enhancing regression discontinuity designs R package mipmatch José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 16 / 17

33 References Zubizarreta, J. R. (2012), Using Mixed Integer Programming for Matching in an Observational Study of Acute Kidney Injury after Surgery, under revision. Zubizarreta, J. R., Cerdá, M. and Rosenbaum, P. R. (2012), Effect of the 2010 Chilean Earthquake on Posttraumatic Stress: Designing an Observational Study to be Less Sensitive to Unmeasured Biases, under revision. Zubizarreta, J. R., Reinke, C. E., Kelz, R. R., Silber, J. H. and Rosenbaum, P. R. (2011), Matching for Several Sparse Nominal Variables in a Case-Control Study of Readmission Following Surgery, The American Statistician, 65, José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 17 / 17

34 Using Mixed Integer Programming for Matching in Observational Studies José R. Zubizarreta Department Statistics The Wharton School University of Pennsylvania José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 17 / 17

Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery

Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery José R. Zubizarreta Abstract This paper presents a new method for optimal matching in observational