Outline. Flexible, optimal matching for observational studies. Optimal matching of two groups Comparing nuclear plants: an illustration

Size: px

Start display at page:

Download "Outline. Flexible, optimal matching for observational studies. Optimal matching of two groups Comparing nuclear plants: an illustration"

Annice Adams
6 years ago
Views:

1 Outle Flexible, optimal matchg for observational studies Optimal matchg of two groups Comparg nuclear plants: an illustration Ben Hansen University of Michigan Generalizations of pair matchg 15 June 2006 The R implementation Existg site Existg site Example: 1:2 matchg by a

2 Existg site Example: 1:2 matchg by a Existg site Example: 1:2 matchg by a Existg site Example: 1:2 matchg by a Existg site Example: 1:2 matchg by a

3 Existg site Example: 1:2 matchg by a Existg site Example: 1:2 matchg by a New and refurbished nuclear plants: discrepancies capacity and year of construction Exist- s g H I J K L M N O P Q R S T U V W X Y Z A B C D E F G Existg site Optimal vs. Greedy matchg By evaluatg potential matches all together rather than sequentially, optimal matchg (blue les) reduces the sum of distances from 82 to 63.

4 Introducg restrictions on who can be matched to whom: calipers In the nuclear plants example, suppose we choose to sist upon a caliper of three years the date of construction. This would forbid five potential matches, dicated below red. Introducg restrictions on who can be matched to whom: calipers With optmatch, matches are forbidden by placg s the distance matrix. Exist- s g H I J K L M N O P Q R S T U V W X Y Z A B C D E F G Exist- s g H I J K L M N O P Q R S T U V W X Y Z A Inf Inf B Inf C D E F 20 Inf 28 Inf G Outle Example # 2: Gender equity study for research scientists 1 Optimal matchg of two groups Comparg nuclear plants: an illustration Generalizations of pair matchg The R implementation and men scientists are to be matched on grant fundg. 1 Discussed Hansen and Klopfer (2005), Hansen (2004)

5 Full Matchg 2 the Gender Equity Sample Similar to matchg with replacement, but creates disjot matched sets better for tests & CIs. In contrast to pair matchg, it fds matches for everyone with a suitable counterpart. In contrast to multiple controls matchg, it doesn t force poor matches. In optmatch, can be combed with structural restrictions. 2 (Rosenbaum, 1991; Hansen and Klopfer, 2005) Full Matchg 2 the Gender Equity Sample Similar to matchg with replacement, but creates disjot matched sets better for tests & CIs. In contrast to pair matchg, it fds matches for everyone with a suitable counterpart. In contrast to multiple controls matchg, it doesn t force poor matches. In optmatch, can be combed with structural restrictions. 2 (Rosenbaum, 1991; Hansen and Klopfer, 2005) Full Matchg 2 the Gender Equity Sample Similar to matchg with replacement, but creates disjot matched sets better for tests & CIs. In contrast to pair matchg, it fds matches for everyone with a suitable counterpart. In contrast to multiple controls matchg, it doesn t force poor matches. In optmatch, can be combed with structural restrictions. 2 (Rosenbaum, 1991; Hansen and Klopfer, 2005) Full Matchg 2 the Gender Equity Sample Similar to matchg with replacement, but creates disjot matched sets better for tests & CIs. In contrast to pair matchg, it fds matches for everyone with a suitable counterpart. In contrast to multiple controls matchg, it doesn t force poor matches. In optmatch, can be combed with structural restrictions. 2 (Rosenbaum, 1991; Hansen and Klopfer, 2005)

6 Outle Optimal matchg of two groups Comparg nuclear plants: an illustration Generalizations of pair matchg Under the hood Full matchg via network flows 3 T C +U 0. 0 C +U [0,1] [0,1]. [0,1] [0,1] Sk The R implementation [0, U 1] C nu [0, Ũ 1] 0 Overflow Arguments to fullmatch() 3 (Hansen and Klopfer, 2005, Fig. 2). Time complexity of the algorithm is O(n 3 log(n max(dist))). Arguments to fullmatch() distance The argument demandg most attention from the user, b/c it defes good matches and because very large distance matrices can tax R s memory limits. A new helper function, makedist(), eases both of these efforts. m.controls, max.controls In propensity matchg, can be important for efficiency see Hansen (2004), Augursky and Kluve (2004). omit.fraction for use matched samplg (as opposed to matchg all or most of a sample). Not needed for gettg rid of subjects without suitable potential matches. If you re not specifically out to reduce the size of the control group, can be ignored. distance The argument demandg most attention from the user, b/c it defes good matches and because very large distance matrices can tax R s memory limits. A new helper function, makedist(), eases both of these efforts. m.controls, max.controls In propensity matchg, can be important for efficiency see Hansen (2004), Augursky and Kluve (2004). omit.fraction for use matched samplg (as opposed to matchg all or most of a sample). Not needed for gettg rid of subjects without suitable potential matches. If you re not specifically out to reduce the size of the control group, can be ignored.

7 Arguments to fullmatch() Summary distance The argument demandg most attention from the user, b/c it defes good matches and because very large distance matrices can tax R s memory limits. A new helper function, makedist(), eases both of these efforts. m.controls, max.controls In propensity matchg, can be important for efficiency see Hansen (2004), Augursky and Kluve (2004). omit.fraction for use matched samplg (as opposed to matchg all or most of a sample). Not needed for gettg rid of subjects without suitable potential matches. If you re not specifically out to reduce the size of the control group, can be ignored. With optmatch, R offers the most comprehensive optimal matchg implementation for statistics. fullmatch() solves optimally such traditional problems as matched samplg, pair matchg, and matchg with k controls. fullmatch() can also solve matchg problems flexibly, and far more effectively, by way of full matchg, with or without structural restrictions (Hansen and Klopfer, 2005; Hansen, 2004). The effort required to code optimal and full matchg algorithms seems to have dissuaded their widespread use. Now that I ve made that effort, perhaps this situation can change! :) Summary Summary With optmatch, R offers the most comprehensive optimal matchg implementation for statistics. fullmatch() solves optimally such traditional problems as matched samplg, pair matchg, and matchg with k controls. fullmatch() can also solve matchg problems flexibly, and far more effectively, by way of full matchg, with or without structural restrictions (Hansen and Klopfer, 2005; Hansen, 2004). The effort required to code optimal and full matchg algorithms seems to have dissuaded their widespread use. Now that I ve made that effort, perhaps this situation can change! :) With optmatch, R offers the most comprehensive optimal matchg implementation for statistics. fullmatch() solves optimally such traditional problems as matched samplg, pair matchg, and matchg with k controls. fullmatch() can also solve matchg problems flexibly, and far more effectively, by way of full matchg, with or without structural restrictions (Hansen and Klopfer, 2005; Hansen, 2004). The effort required to code optimal and full matchg algorithms seems to have dissuaded their widespread use. Now that I ve made that effort, perhaps this situation can change! :)

8 Summary With optmatch, R offers the most comprehensive optimal matchg implementation for statistics. fullmatch() solves optimally such traditional problems as matched samplg, pair matchg, and matchg with k controls. fullmatch() can also solve matchg problems flexibly, and far more effectively, by way of full matchg, with or without structural restrictions (Hansen and Klopfer, 2005; Hansen, 2004). The effort required to code optimal and full matchg algorithms seems to have dissuaded their widespread use. Now that I ve made that effort, perhaps this situation can change! :) Example with propensity scores and stratification prior to matchg >nuclear$pscore <- glm(pr~.-cost, + family=bomial(),data=nuclear)$lear.predictors > pscorediffs <- function(trtvar,data) { + pscr <- data[names(trtvar), pscore ] + abs(outer(pscr[trtvar],pscr[!trtvar], - )) + } > psd2 <- makedist(pr~pt, nuclear, pscorediffs) > fullmatch(psd2) > fullmatch(psd2, m.controls=1, max.controls=3) > fullmatch(psd2, m=1, max=c( 0 =3, 1 =2)) Jake Bowers and my RItools package provides diagnostics... Example with propensity scores and stratification prior to matchg >nuclear$pscore <- glm(pr~.-cost, + family=bomial(),data=nuclear)$lear.predictors > pscorediffs <- function(trtvar,data) { + pscr <- data[names(trtvar), pscore ] + abs(outer(pscr[trtvar],pscr[!trtvar], - )) + } > psd2 <- makedist(pr~pt, nuclear, pscorediffs) > fullmatch(psd2) > fullmatch(psd2, m.controls=1, max.controls=3) > fullmatch(psd2, m=1, max=c( 0 =3, 1 =2)) Jake Bowers and my RItools package provides diagnostics... Example with propensity scores and stratification prior to matchg >nuclear$pscore <- glm(pr~.-cost, + family=bomial(),data=nuclear)$lear.predictors > pscorediffs <- function(trtvar,data) { + pscr <- data[names(trtvar), pscore ] + abs(outer(pscr[trtvar],pscr[!trtvar], - )) + } > psd2 <- makedist(pr~pt, nuclear, pscorediffs) > fullmatch(psd2) > fullmatch(psd2, m.controls=1, max.controls=3) > fullmatch(psd2, m=1, max=c( 0 =3, 1 =2)) Jake Bowers and my RItools package provides diagnostics...

9 Example with propensity scores and stratification prior to matchg >nuclear$pscore <- glm(pr~.-cost, + family=bomial(),data=nuclear)$lear.predictors > pscorediffs <- function(trtvar,data) { + pscr <- data[names(trtvar), pscore ] + abs(outer(pscr[trtvar],pscr[!trtvar], - )) + } > psd2 <- makedist(pr~pt, nuclear, pscorediffs) > fullmatch(psd2) > fullmatch(psd2, m.controls=1, max.controls=3) > fullmatch(psd2, m=1, max=c( 0 =3, 1 =2)) Jake Bowers and my RItools package provides diagnostics... Agresti, A. (2002), Categorical data analysis, John Wiley & Sons. Augursky, B. and Kluve, J. (2004), Assessg the performance of matchg algorithms when selection to treatment is strong, Tech. rep., RWI-Essen. Cox, D. R. and Snell, E. J. (1989), Analysis of bary data, Chapman & Hall Ltd. Hansen, B. B. (2004), Full matchg an observational study of coachg for the SAT, Journal of the American Statistical Association, 99, (2005), OPTMATCH, an add-on package for R. Hansen, B. B. and Klopfer, S. O. (2005), Optimal full matchg and related designs via network flows, Tech. Rep. 416, Statistics Department, University of Michigan. Raudenbush, S. W. and Bryk, A. S. (2002), Hierarchical Lear Models: Applications and Data Analysis Methods, Sage Publications Inc. Rosenbaum, P. R. (1991), A Characterization of Optimal Designs for Observational Studies, Journal of the Royal Statistical Society, 53, (2002a), Attributg effects to treatment matched observational studies, Journal of the American Statistical Association, 97, (2002b), Covariance adjustment randomized experiments and observational studies, Statistical Science, 17, (2002c), Observational Studies, Sprger-Verlag, 2nd ed. Rub, D. B. (1979), Usg Multivariate Matched Samplg and Regression Adjustment to Control Bias Observational Studies, Journal of the American Statistical Association, 74, Smith, H. (1997), Matchg with Multiple Controls to Estimate Treatment Effects Observational Studies, Sociological Methodology, 27, Modes of estimation for treatment effects Preferred Type of outcome mode of ference Categorical Contuous Randomization Agresti (2002), Rosenbaum (2002c), Categorical Data Observational Studies; Analysis; Rosenbaum Rosenbaum (2002b), Covariance (2002a), Atributg adjustment.... effects to treatment... Conditional a Agresti (2002); Cox and ordary OLS b is fe; see Snell (1989), Analysis of also Rub (1979), Usg bary data multivariate matched.... Bayes, esp. Agresti (2002) Smith (1997), Matchg hierarchical with multiple controls... ; lear models Raudenbush and Bryk c (2002), Hierarchical lear models a Uses a fixed effect for each matched set. b i.e., OLS with a fixed effect for each matched set plus treatment effect(s) c Uses a random effect for each matched set.

Flexible, Optimal Matching for Comparative Studies Using the optmatch package

Flexible, Optimal Matching for Comparative Studies Using the optmatch package Ben Hansen Statistics Department University of Michigan ben.b.hansen@umich.edu http://www.stat.lsa.umich.edu/~bbh 9 August