Background of Matching - PDF Free Download

Background of Matching Matching is a method that has gained increased popularity for the assessment of causal inference. This method has often been used in the field of medicine, economics, political science, sociology, law, and of course, statistics. 1 Matching works for experimental data, but is usually used for observational studies where the treatment variable is not randomly assigned by the investigator, or the random assignment goes awry. 2 There are a wide variety of matching procedures and there has been no consensus on how matching ought to be done and how to measure the success of the matching method. 1 When using matching methods to estimate causal effects, a common issue is deciding how best to perform the matching. Two common approaches are propensity score matching and multivariate matching based on the Mahalanobis distance. Matching methods based on the propensity score, Mahalanobis distance, or a combination of the two, have appealing theoretical properties if covariates have ellipsoidal distributions, e.g. distributions such as the normal or t. If the covariates are so distributed, these methods have the property of equal percent bias reduction (EPBR). When this property holds, matching will reduce bias in all linear combinations of the covariates. 3 If the EPBR property does not hold, then, in general, matching will increase the bias of some linear functions of the covariates even if all univariate means are closer in the matched data than the unmatched. 3 Unfortunately, the EPBR property rarely holds. 3 Rubin Causal Model A causal effect is the difference between an observed outcome and its counterfactual. The Rubin causal model conceptualizes causal inference in terms of potential outcomes under treatment and control, only one of which is observed for each unit. Let Y i1 denote the potential outcome for unit i if the unit receives treatment, and let Y i0 denote the potential outcome for unit i in the control regime. The treatment effect for observation i is defined by t i = Y i1 Y i0. Causal inference is a missing data problem because Y i1 and Y i0 are never both observed. Let T i be a treatment indicator equal to 1 when i is in the treatment regime and 0 otherwise. The observed outcome for observation i is then Y i = T i Y i1 + (1 T i )Y i0. In principle, if assignment to treatment is randomized, causal inference is straightforward because the two groups are drawn from the same population by construction, and treatment assignment is independent of all baseline variables. As the sample size grows, observed and unobserved baseline variables are balanced across treatment and control groups with arbitrarily high probability, because treatment assignment is independent of Y 0 and Y 1, ie {Y i0, Y i1 T i }. For j=0,1, E(Y ij T i = 1) = E(Y ij T i = 0) = E(Y i T i = j) Therefore, the average treatment effect (ATE) can be estimated by: t = E(Y i1 T i = 1) E(Y i0 T i = 0) = E(Y i T i = 1) E(Y i T i = 0) In an observational setting, covariates are almost never balanced across treatment and control groups because the two groups are not ordinarily drawn from the same population. Thus, a common quantity of interest is the average treatment effect for the treated (ATT): t (T = 1) = E(Y i1 T i = 1) E(Y i0 T i = 1). However, this cannot be directly estimated since Y i0 is not observed for the treated. Progress can be made by assuming that selection into treatment depends on observable covariates X. Following Rosenbaum and Rubin (1983), one can assume that conditional on X, treatment assignment is unconfounded ({Y 0, Y 1 T} X) and

that there is overlap: 0 < Pr(T = 1 X) < 1. Together, unconfoundedness and overlap constitute a property known as strong ignorability of treatment assignment which is necessary for identifying the average treatment effect. Heckman, Ichimura, Smith, and Todd (1998) show that for ATT, the unconfoundedness assumption can be weakened to mean independence: E (Y ij T i,x i ) = E (Y ij X i ). The overlap assumption for ATT only requires that the support of X for the treated be a subset of the support of X for control observations. Then, following Rubin (1974, 1977) we obtain E(Y ij X i, T i = 1) = E(Y ij X i, T i = 0) = E(Y i X i, T i = j). By conditioning on observed covariates, Xi, treatment and control groups are exchangeable. The average treatment effect for the treated is estimated as t (T = 1) = E {E(Y i X i, T i = 1) E(Y i X i, T i = 0) T i = 1}, where the outer expectation is taken over the distribution of X i (T i = 1) which is the distribution of baseline variables in the treated group. The most straightforward and nonparametric way to condition on X is to exactly match on the covariates. This approach fails in finite samples if the dimensionality of X is large or if X contains continuous covariates. Thus, in general, alternative methods must be used. Mahalanobis Distance Matching The most common and conventional method for matching (without propensity score) is the Mahalanobis distance. Mahalanobis distance is the distance between two N dimensional points scaled by the statistical variation in each component of the point. For example, if X i and X j are two points from the same distribution with covariance matrix C, then the Mahalanobis distance can be expressed as D(X i,x j ) = {(X i -X j ) t C -1 (X i -X j )} 1/2 When the covariance matrix is the identity matrix, Mahalanobis distance specializes to the Euclidean distance. Mahalanobis Distance Matching was used as one method of matching observations based on Mahalanobis distance for bias reduction in observational studies. Propensity Score Matching An alternative way to condition on X is to match on the probability of assignment to treatment given a vector of covariates, known as the propensity score. 1 As one sample grows large, matching on the propensity score produces balance on the vector of covariates X, 1 meaning, as your sample size increases the distribution of covariates between treated and control groups will become balanced when matched on propensity score. This method involves matching each treated unit to the nearest control unit on the unidimentional metric of the propensity score vector. 1 If the propensity score is estimated by logistic regression matching should be done on the linear predictor μ=xβ. 1 By matching on the linear predictor compression of propensity scores near zero and one is avoided. 1 Often, the linear predictor is more nearly normally distributed which is of importance given the EPBR results if the propensity score is matched on along with other covariates. 1 It has been noted to be useful to combine the propensity score matching with Mahalanobis distance. This has shown to be effective because propensity score matching is particularly good at minimizing the discrepancy

along the propensity score and Mahalanobis distance is particularly good at minimizing the distance between individual coordinates of X. 1 There are several methods of propensity score matching, including exact matching, nearest neighbor matching, optimal matching, full matching, genetic matching, and coarsened exact matching. Exact Matching Method The simplest way to obtain good matches is to use one-to-one exact matching. 2 This pairs each treated unit with one control unit for which the values of X i are identical, meaning, that each treated unit is matched to all possible control units with exactly the same values on all the covariates. 2 This method will form subclasses and within each subclass all units have the same values. 2 However, the implementation of this method can be rather difficult. With Many covariates and finite numbers of potential matches, sufficient exact matches often cannot be found. 2 Nearest Neighbor Matching Method The nearest neighbor matching method selects the r best control matches for each individual in the treatment group. 2 Matching is done using a distance measure specified by logistic regression. 2 Matches are chosen for each treated unit one at a time from largest to smallest. 2 At each matching step we choose the control unit that is not yet matched, but is closest to the treated unit on the distance measure (propensity score). 2 The nearest neighbor method is greedy matching, where the closest control match for each treated unit is chosen one at a time, without trying to minimize a global distance measure. 2 Optimal Matching Method In contrast to greedy matching optimal matching method finds the matched samples with the smallest average distance across all matched pairs. 2 Optimal matching is felt to do a better job of minimizing the distance within each pair as well as be helpful when there are not as many appropriate control matches for the treated units. 2 Full Matching Method Full matching is a type of subclassification that forms the subclasses in an optimal way. 2 A fully matched sample is composed of matched sets, where each matched set contains one treated unit and one or more controls. 2 Full matching is optimal in terms of minimizing a weighted average of the estimated distance measure between each treated subject and each control subject within each subclass. 2 Genetic Matching Method The idea behind this matching method is to use a genetic search algorithm to find a set of weights for each covariate so that a version of optimal balance is achieved after matching has been completed. 2 The algorithm in genetic matching maximizes the balance of observed baseline covariates across matched treated and control units. 4 If a matching method is not EPBR, then that method will, in general, increase the bias for some linear function of the covariates even if all univariate means are closer in the matched data than the unmatched. 4 Genetic matching is shown to have better properties than the usual alternative matching methods both when the EPBR property holds and when it does not. Even when the EPBR property holds and the mapping from X to Y is linear, genetic matching has better efficiency i.e., lower mean squared error (MSE) in finite samples. When the EPBR property does not hold, as it generally does not, genetic matching retains appealing properties and the differences in performance between genetic matching and the other matching methods can become substantial both in terms of bias and MSE reduction. In short, at the expense of computer time, genetic matching dominates the other matching methods in terms of MSE when assumptions required for EPBR hold and, even more so, when they do not. 4

Coarsened Exact Matching (CEM) The CEM is a Monotonic Imbalance Bounding (MIB) matching method, meaning, that the balance between treated and control groups is chosen by the user before rather than discovered through the usual process of checking after the fact and repeatedly re-estimating. 2 The result is that the arduous process of balance checking, tweaking, and repeatedly rerunning the matching procedure is therefore eliminated, as is the uncertainty about whether the matching procedure will improve balance at all. You get what you want rather than getting what you get. 4 The basic idea of CEM is to coarsen each variable by recoding so that substantively indistinguishable values are grouped and assigned the same numerical value (groups may be the same size or different sizes, depending on the substance of the problem). Then the exact matching algorithm is applied to the coarsened data to determine the matches and to prune unmatched units. Finally, the coarsened data are discarded and the original (uncoarsened) values of the matched data are retained. 4 MatchIt Function for R MatchIt is designed for causal inference with a dichotomous treatment variable and a set of pretreatment control variables. 2 The MatchIt function may be used in R to conduct the previously mentioned matching methods. The main command matchit () implements the matching procedure. A general syntax includes: > m.out <- matchit(treat ~ x1 + x2, data = mydata) where treat is the dichotomous treatment variable, and x1 + x2 are pre-treatment covariates, all of which are contained in the data frame mydata. Further examples will be shown using a subset of the job training program analyzed in Lalonde (1986). Mahalanobis matching is implemented in MatchIt using distance= mahalanobis using the syntax: >m.out = matchit(treat ~ educ + black + hispan + age, data=lalonde, mahvars=c("age","educ"), distance= Mahalanobis, replace=false) The mahvars option: variables on which to perform Mahalanobis-metric matching within each caliper (default = NULL). Variables should be entered as a vector of variable names e.g., mahvars = c("x1", "X2")). If mahvars is specified without caliper, the caliper is set to 0.25. Exact matching is implemented in MatchIt using method = exact using the syntax: > m.out = matchit (treat ~ educ + black + hispan, data = lalonde, method = exact ) The default nearest neighbor matching method in MatchIt is greedy matching and is implemented using method = nearest and the syntax: > m.out = matchit (treat ~ re74 + re75 + educ + black + hispan +age, data = lalonde, method = nearest ) Optimal matching is performed with MatchIt by setting method = optimal, which automatically loads an add-on package called optmatch. A 2:1 optimal ratio match is conducted based on the propensity score from the logistic regression and the appropriate syntax for optimal matching is: > m.out = matchit (treat ~ re74 + re75 + age + educ + black + hispan, data = lalonde, method = optimal, ratio = 2)

Full matching can be performed with MatchIt by setting method = full. As with optimal matching the optmatch package is utilized. The appropriate syntax is: > m.out = matchit (treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, method = full Genetic matching can be performed with MatchIt by setting method = genetic, which automatically loads the Matching package. The appropriate syntax is: > m.out = matchit (treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, method = genetic CEM can be performed with MatchIt by setting method = cem, which automatically loads the cem package. The appropriate syntax is: > m.out = matchit (treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, method = cem Checking Balance Using MatchIt The goal of matching is to create a data set that looks closer to one that would result from a perfectly blocked (and possibly randomized) experiment. When we get close, we break the link between the treatment variable and the pretreatment controls, which makes the parametric form of the analysis model less relevant or irrelevant entirely. To break this link, we need the distribution of covariates to be the same within the matched treated and control groups. 2 A crucial part of any matching procedure is, therefore, to assess how close the (empirical) covariate distributions are in the two groups, which is known as balance. Because the outcome variable is not used in the matching procedure, any number of matching methods can be tried and evaluated, and the one matching procedure that leads to the best balance can be chosen. MatchIt provides a number of ways to assess the balance of covariates after matching, including numerical summaries such as the mean Diff. (difference in means) or the difference in means divided by the treated group standard deviation, and summaries based on quantilequantile (Q-Q) plots that compare the empirical distributions of each covariate. The widely used procedure of doing t-tests of the difference in means is highly misleading and should never be used to assess balance. These balance diagnostics should be performed on all variables in X, even if some are excluded from one of the matching procedures. 2 The summary() command The summary() command gives measures of the balance between the treated and control groups in the full (original) data set, and then in the matched data set. If the matching worked well, the measures of balance should be smaller in the matched data set (smaller values of the measures indicate better balance). The summary() output for subclassification is the same as that for other types of matching, except that the balance statistics are shown separately for each subclass, and the overall balance in the matched samples is calculated by aggregating across the subclasses, where each subclass is weighted by the number of units in the

subclass. For exact matching, the covariate values within each subclass are guaranteed to be the same, and so the measures of balance are not output for exact matching; only the sample sizes in each subclass are shown. The summary() command provides means, the original control group standard deviation (where applicable), mean differences, standardized mean differences, and (median, mean and maximum) Quantile- Quantile (Q-Q) plot differences. In addition, the summary() command will report the matched call, how many units were matched, unmatched, or discarded due to the discard option (described below), and the percent improvement in balance for each of the balance measures, defined as 100(( a b )/ a ), where a is the balance before and b is the balance after matching. For each set of units (original and matched data sets, with weights used as appropriate in the matched data sets), the following statistics are provided: Means Treated and Means Control show the weighted means in the treated and control groups SD Control is the standard deviation calculated in the control group (where applicable) Mean Diff is the difference in means between the groups The final three columns of the summary output give summary statistics of a Q-Q plot. Those columns give the median, mean, and maximum distance between the two empirical quantile functions (treated and control groups). Values greater than 0 indicate deviations between the groups in some part of the empirical distributions. The plots of the two empirical quantile functions themselves, described below, can provide further insight into which part of the covariate distribution has differences between the two groups. Additional options: Three options to the summary() command can also help with assessing balance and respecifying the propensity score model, as necessary. First, the interactions = TRUE option with summary() shows the balance of all squares and interactions of the covariates used in the matching procedure. Large differences in higher order interactions usually are a good indication that the propensity score model (the distance measure) needs to be respecified. Similarly, the addlvariables option with summary() will provide balance measures on additional variables not included in the original matching procedure. If a variable (or interaction of variables) not included in the original propensity score model has large imbalances in the matched groups, including that variable in the next model specification may improve the resulting balance on that variable. Because the outcome variable is not used in the matching procedure, a variety of matching methods can be tried, and the one that leads to the best resulting balance chosen. Finally, the standardize = TRUE option will print out standardized versions of the balance measures, where the mean difference is standardized (divided) by the standard deviation in the original treated group. The plot() command We can also examine the balance graphically using the plot() command, which provides three types of plots: jitter plots of the distance measure, Q-Q plots of each covariate, and histograms of the distance measure. For subclassification, separate Q-Q plots can be printed for each subclass. The jitter plot for subclassification is the same as that for other types of matching, with the addition of vertical lines indicating the subclass cutpoints. With the histogram option, 4 histograms are provided: the original treated and control groups and the matched treated and control groups. For the Q-Q plots and the histograms, the weights that result after matching are used to create the plots. *NOTE: There are many other options that go with all of the MatchIt methods. They can be found starting at section 4.1.0.2.2 in the MatchIt documentation found at http://gking.harvard.edu/matchit/docs/matchit.pdf.

Some Examples Lalonde data The National Supported Work Demonstration (NSW) was a temporary employment program designed to help disadvantaged workers lacking basic job skills move into the labor market by giving them work experience and counseling in a sheltered environment. Unlike other federally sponsored employment and training programs, the NSW program assigned qualified applicants to training positions randomly. Those assigned to the treatment group received all the benefits of the NSW program, while those assigned to the control group were left to fend for themselves. 6 A data frame with 614 observations (185 treated, 429 control). There are 10 variables measured for each individual. "treat" is the treatment assignment (1=treated, 0=control), from a field experiment where individuals were randomly assigned to participate in a job training program. "age" is age in years. "educ" is education in number of years of schooling. "black" is an indicator for African-American (1=African-American, 0=not). "hispan" is an indicator for being of Hispanic origin (1=Hispanic, 0=not). "married" is an indicator for married (1=married, 0=not married). "nodegree" is an indicator for whether the individual has a high school degree (1=no degree, 0=degree). "re74" is income in 1974, in U.S. dollars. "re75" is income in 1975, in U.S. dollars. "re78" is income in 1978, in U.S. dollars. The outcome of interest is real earnings in 1978. (1) Propensity score matching: When insufficient exact matches can be found, as this becomes increasingly common as the number of covariates increase, we need to find a way to identify matches that are close.'' In this situation, matching on the estimated propensity score is a useful alternative. The propensity score is the probability that a unit receives treatment, given the covariates. To conduct propensity score matching, with pre-treatment covariates composed of age, years of education, high school degree, real earnings in 1974 and 1975: *a first look before any transformations are done to the income variables > data(lalonde) > attach(lalonde) > m.out=matchit(treat ~ age + educ + nodegree + re74 + re75, data=lalonde) > summary(m.out) Call: matchit(formula = treat ~ age + educ + nodegree + re74 + re75, data = lalonde) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance 0.365 0.274 0.134 0.091 0.089 0.092 0.173 age educ 25.816 10.346 28.030 10.235 10.787 2.855-2.214 0.111 1.000 1.000 3.265 0.703 10.000 4.000 nodegree 0.708 0.597 0.491 0.111 0.000 0.114 1.000 re74 re75 2095.574 1532.055 5619.237 2466.484 6788.751-3523.663 2425.572 3620.924 9216.500 3291.996-934.429 981.097 1060.658 6795.010 Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance age 0.365 25.816 0.360 24.784 0.109 9.648 0.005 1.032 0.002 3.000 0.005 2.957 0.03 8.00 educ 10.346 10.168 2.617 0.178 0.000 0.535 4.00 nodegree re74 0.708 2095.574 0.746 2218.473 0.437-0.038 0.000 0.038 1.00 4371.621-122.899 104.593 445.272 9177.75 re75 1532.055 1428.977 2297.037 103.078 172.531 409.070 13737.89 Percent Balance Improvement: Mean Diff. eqq Med eqq Mean eqq Max

distance 94.87 98.24 94.385 82.50 age 53.37-200.00 9.437 20.00 educ -61.41 100.00 23.846 0.00 nodegree 66.03 0.00 66.667 0.00 re74 96.51 95.69 87.703 0.42 re75 88.97 82.41 61.432-102.18 Sample sizes: Control Treated All 429 185 Matched Unmatched 185 244 185 0 Discarded 0 0 This reveals simple statistics of the propensity score and the covariates used in the propensity score specification for the full and matched samples, including t-statistics and balance bias statistics used to assess whether there was a reduction in bias in the covariates. We see that 185 control units were matched to the 185 treated units (a 1-1'' match). The average propensity scores in the matched treated and control groups are much more similar than in the original groups, with both groups having propensity score means of roughly 0.36 in the matched samples. All six variables (propensity score, age, education, degree, 1974 income, and 1975 income) had reductions in bias due to the matching. For example, job training participants on average earned roughly $3,523 less in 1974 and $934 less in 1975 than non-participants. In the matched sample, the earnings difference is only $122 in 1974 and $103 in 1975. This one-to-one matching algorithm has thus chosen 185 control individuals who do look very similar to the treated group on the covariates used in the matching process. The summary command will additionally report (a) the original call of the MatchIt object, (b) whether there are any ``Problematic covariates'' that may still be imbalanced in the assignment model, and (c) how many units were discarded due to the discard option (described below). In this case there were no units discarded and no ``problematic covariates.'' In order to perform t-tests and plot difference scores from the resulting matching, we wish to create a new data frame (i.e. table) that combines subjects that are paired from the two groups to a single row or data point. First, we ll create two new data frames for the two groups. > t.group=match.data(m.out, group="treat") > c.group=match.data(m.out, group="control") The MatchIt routine will return a matrix with two columns that map the treatment group to the control group (m.out$match.matrix), this will be merged to the t.group data frame. > t.group=merge(t.group, m.out$match.matrix, by="row.names") Finally, we can now merge the two treatment groups into a single data frame. > a.matched=merge(t.group, c.group, by.x="1", by.y="row.names") > attach(a.matched) Now we wish to compare earnings for 1978 for the two groups > aa78=cbind(re78.x,re78.y) > granova.ds(aa78,ptcex = c(.7,1), colors =c(1, 2, 1, 1, 2, "green3"), main="real Earnings 1975 Treatment vs. Control Groups")

Summary Stats n 185.000 mean(x) 6349.144 mean(y) 6022.822 mean(d=x-y) 326.321 SD(D) 10304.876 ES(D) 0.032 r(x,y) -0.039 r(x+y,d) 0.210 LL 95%CI -1168.437 UL 95%CI 1821.080 t(d-bar) 0.431 df.t 184.000 pval.t 0.667 Looking at the distributions of the income variables it is clear that they are strongly positively skewed. In this case taking the log transformation of the income variables would be appropriate. **After log transformations of income variables:

> re7475.log=log(re74+re75+1) > re78.log=log(re78+1) > data=cbind(lalonde, re7475.log, re78.log,re74.log,re75.log) > m.out=matchit(treat~age + educ + nodegree + re7475.log, data=data) > summary(m.out) Call: matchit(formula = treat ~ age + educ + nodegree + re7475.log, data = data) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance 0.437 0.243 0.166 0.194 0.159 0.195 0.354 age 25.816 28.030 10.787-2.214 1.000 3.265 10.000 educ 10.346 10.235 2.855 0.111 1.000 0.703 4.000 nodegree 0.708 0.597 0.491 0.111 0.000 0.114 1.000 re7475.log 3.517 7.155 3.353-3.638 1.737 3.630 8.887 Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance 0.437 0.365 0.183 0.072 0.036 0.073 0.207 age 25.816 25.043 10.426 0.773 3.000 3.238 10.000 educ 10.346 10.422 2.649-0.076 0.000 0.432 3.000 nodegree 0.708 0.670 0.471 0.038 0.000 0.038 1.000 re7475.log 3.517 4.816 3.844-1.299 0.151 1.420 6.960 Percent Balance Improvement: Mean Diff. eqq Med eqq Mean eqq Max distance 62.74 77.05 62.513 41.45 age 65.09-200.00 0.828 0.00 educ 31.52 100.00 38.462 25.00 nodegree 66.03 0.00 66.667 0.00 re7475.log 64.30 91.33 60.874 21.69 Sample sizes: Control Treated All 429 185 Matched 185 185 Unmatched 244 0 Discarded 0 0 > t.group=match.data(m.out, group="treat") > c.group=match.data(m.out, group="control") > t.group=merge(t.group, m.out$match.matrix, by="row.names") > a.matched=merge(t.group, c.group, by.x="1", by.y="row.names") > attach(a.matched) > aa78=cbind(re78.log.x,re78.log.y) > granova.ds(aa78, ptcex = c(.7,1), colors =c(1, 2, 1, 1, 2, "green3"), main="real Earnings 1978 Treatment vs. Control Groups") Summary Stats n 185.000 mean(x) 6.507 mean(y) 6.274 mean(d=x-y) 0.233 SD(D) 5.410 ES(D) 0.043 r(x,y) 0.011

r(x+y,d) -0.020 LL 95%CI -0.551 UL 95%CI 1.018 t(d-bar) 0.587 df.t 184.000 pval.t 0.558 After taking the log transformation the t-statistic increased from.43 to.59. However due to 0 values for the re78 income variable, which is clearly evident on the graphic, it is still hard to see an effect. Balance still remains high between treatment and control groups though with mean propensity scores differing by 0.072 after matching compared to.194 before matching. Next we will remove those with 0 values for the re78 income variable: Before taking out the 0 s we had Sample sizes: Control Treated After taking out 0 s we have Sample sizes: Control Treated

All 429 185 All 331 140 So we lost about 1/3 of the controls and ¼ of the treated. To remove those with a 0 value I simply selected those without a 0 value: > x=lalonde[lalonde[,"re78"]!=0,] Re-running the granova.ds function on the data without 0 values for the re78 income variable we get: > attach(x) > lalonde.data=cbind(x, re7475.log, re78.log,re74.log,re75.log) > m.out=matchit(treat~age + educ + nodegree + re7475.log, data=lalonde.data) > summary(m.out) Call: matchit(formula = treat ~ age + educ + nodegr + re7475.log, data = lalonde.data) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance 0.476 0.437 0.095 0.039 0.042 0.041 0.080 age 26.036 24.726 7.249 1.310 1.000 1.564 7.000 educ 10.386 10.048 1.644 0.338 0.000 0.507 2.000 nodegr 0.693 0.833 0.374-0.140 0.000 0.136 1.000 re7475.log 3.706 3.019 4.190 0.687 0.000 0.726 7.438 Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance 0.476 0.452 0.091 0.024 0.021 0.024 0.064 age 26.036 25.757 7.293 0.279 0.000 0.650 8.000 educ 10.386 10.086 1.753 0.300 0.000 0.443 2.000 nodegr 0.693 0.807 0.396-0.114 0.000 0.114 1.000 re7475.log 3.706 3.551 4.320 0.156 0.000 0.228 6.255 Percent Balance Improvement: Mean Diff. eqq Med eqq Mean eqq Max distance 39.33 49.15 41.10 19.44 age 78.73 100.00 58.45-14.29 educ 11.27 0.00 12.68 0.00 nodegr 18.64 0.00 15.79 0.00 re7475.log 77.34 0.00 68.55 15.91 Sample sizes: Control Treated All 168 140 Matched 140 140 Unmatched 28 0 Discarded 0 0 Once again matching has improved balance for treatment and control groups. > granova.ds(aa78, ptcex = c(.7,1), colors =c(1, 2, 1, 1, 2, "green3"), main="real Earnings 1978 Treatment vs. Control Groups") Summary Stats n 140.000 mean(x) 8.598 mean(y) 8.550

mean(d=x-y) 0.049 SD(D) 1.303 ES(D) 0.037 r(x,y) 0.180 r(x+y,d) 0.038 LL 95%CI -0.169 UL 95%CI 0.266 t(d-bar) 0.443 df.t 139.000 pval.t 0.659 While the graphic still does not show significant differences between treatment and control groups it is a clearer picture now that the 0 s are removed. This graphic shows the necessity of the log transformation to get a better understanding of what is going on. It becomes clear for a small subset of the 140 pairs that the treatment does show evidence of having worked better than the control. Next: Mahalanobis Matching

Let s say you did not want to match on propensity score. In fact, propensity score matching was not always around. There were other methods such as Mahalnobis Distance Matching which is based on correlations between variables by which different patterns can be identified and analyzed. Using the same dataset as before, with the log transformations of all income variables and eliminating those with a 0 value for the re78 income variable we get: > m.out2=matchit(treat ~ age + educ + nodegree + re7475.log, data=lalonde.data, mahvars=c("age","educ","nodegree","re7475.log"), caliper=.25, replace=false, distance="mahalanobis") > summary(m.out2) Call: matchit(formula = treat ~ age + educ + nodegree + re7475.log, data = lalonde.data, distance = "mahalanobis", mahvars = c("age", "educ", "nodegree", "re7475.log"), caliper = 0.25, replace = FALSE) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance 4.098 3.946 3.398 0.152 0.373 0.648 12.870 age 26.036 26.988 9.859-0.952 1.500 2.293 8.000 educ 10.386 10.366 2.713 0.020 0.000 0.529 4.000 nodegree 0.693 0.592 0.492 0.101 0.000 0.100 1.000 re7475.log 3.706 7.488 3.138-3.782 2.098 3.767 8.989 Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance 4.098 3.854 2.407 0.244 0.206 0.277 0.731 age 26.036 26.279 9.981-0.243 2.000 2.600 8.000 educ 10.386 10.079 2.320 0.307 0.000 0.364 1.000 nodegree 0.693 0.714 0.453-0.021 0.000 0.021 1.000 re7475.log 3.706 6.221 3.917-2.515 0.796 2.532 8.288 Percent Balance Improvement: Mean Diff. eqq Med eqq Mean eqq Max distance -60.75 44.80 57.29 94.319 age 74.50-33.33-13.40 0.000 educ -1423.88 0.00 31.08 75.000 nodegree 78.72 0.00 78.57 0.000 re7475.log 33.50 62.08 32.79 7.797 Sample sizes: Control Treated All 331 140 Matched 140 140 Unmatched 191 0 Discarded 0 0 Matching on the Mahalanobis distance produced similar results to matching on propensity score. You can achieve better or worse matching by playing with the caliper and variables on which to perform Mahalanobis-metric matching within each caliper. >t.group=match.data(m.out2, group="treat") >c.group=match.data(m.out2, group="control") >t.group=merge(t.group, m.out2$match.matrix, by="row.names")

>a.matched2=merge(t.group, c.group, by.x="1", by.y="row.names") >attach(a.matched2) >aa78.2=cbind(re78.log.x,re78.log.y) > granova.ds(aa78.2, ptcex = c(.7,1), colors =c(1, 2, 1, 1, 2, "green3"), main="real Earnings 1978 Treatment vs. Control Groups") Summary Stats n 140.000 mean(x) 8.598 mean(y) 8.534 mean(d=x-y) 0.064 SD(D) 1.540 ES(D) 0.042 r(x,y) 0.159 r(x+y,d) -0.238 LL 95%CI -0.193 UL 95%CI 0.322 t(d-bar) 0.495 df.t 139.000 pval.t 0.621 Again it is clear that the confidence interval spans zero indicating no significant different in treatment vs. control groups. However, it is again evident that the treatment does show evidence of having worked better than the control for a subset of people, particularly those to the far right of the graphic. Lastly, Genetic Matching

Genetic matching automates the process of finding a good matching solution. The idea is to use a genetic search algorithm to find a set of weights for each covariate such that a version of optimal balance is achieved after matching. Due to the complex nature of the algorithm, this type of matching methods can take particularly long to run, especially with large datasets consisting of numerous covariates. > m.out3=matchit(treat ~ age + educ + nodegree + re7475.log, data=lalonde.data, method="genetic",replace=false) *NOTE: using replace=false ensures 1-1 matching without replacement. > summary(m.out3) Call: matchit(formula = treat ~ age + educ + nodegr + re7475.log, data = lalonde.data, method = "genetic", replace = FALSE) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance 0.476 0.437 0.095 0.039 0.042 0.041 0.080 age 26.036 24.726 7.249 1.310 1.000 1.564 7.000 educ 10.386 10.048 1.644 0.338 0.000 0.507 2.000 nodegr 0.693 0.833 0.374-0.140 0.000 0.136 1.000 re7475.log 3.706 3.019 4.190 0.687 0.000 0.726 7.438 Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance 0.476 0.453 0.095 0.023 0.02 0.023 0.064 age 26.036 25.893 7.357 0.143 0.00 0.543 8.000 educ 10.386 10.286 1.633 0.100 0.00 0.443 3.000 nodegr 0.693 0.800 0.401-0.107 0.00 0.107 1.000 re7475.log 3.706 3.623 4.346 0.083 0.00 0.180 5.957 Percent Balance Improvement: Mean Diff. eqq Med eqq Mean eqq Max distance 42.30 51.85 42.93 20.09 age 89.09 100.00 65.30-14.29 educ 70.42 0.00 12.68-50.00 nodegr 23.73 0.00 21.05 0.00 re7475.log 87.93 0.00 75.27 19.92 Sample sizes: Control Treated All 168 140 Matched 140 140 Unmatched 28 0 Discarded 0 0 Balance is almost perfect between the two groups after running the genetic algorithm. > granova.ds(aa78.3, ptcex = c(.7,1), colors =c(1, 2, 1, 1, 2, "green3"), main="real Earnings 1978 Treatment vs. Control Groups") Summary Stats n 140.000 mean(x) 8.598 mean(y) 8.583 mean(d=x-y) 0.016 SD(D) 1.512

ES(D) 0.010 r(x,y) -0.122 r(x+y,d) 0.053 LL 95%CI -0.237 UL 95%CI 0.268 t(d-bar) 0.122 df.t 139.000 pval.t 0.903 Again, there is no significant overall treatment effect. Although, you can still see that there are a select few pairs that do show that the treatment shows evidence of having worked better than the control. Conclusion: There are numerous matching methods out there. Whether you chose the simplest method or a complex algorithmic method, you could likely end up with the same results, as we have seen here. If your interest is mainly balance then your specific choices of matching methods could display dissimilar results. But if in the end your goal is to examine treatment effects it is probable that your results will be the same if you chose one

method over another. It is suggested that you explore your research question at hand and determine which matching method suits your analytic needs. ******************************************************end LaLonde analyses Mini Analysis of the Berkeley Birthweight Data Question: How do babies born to smokers compare with respect to birth weights to babies born to non-smokers? After considering interactions such as ed*race*weight and age*weight and suggested by the rpart diagram, they proved to have z-values close to 0 and were removed from the model. Model chosen is : > model1=glm(smoke~parity + age + weight + ed + ded + factor(racer) + factor(dracer), data=birthwt, family=binomial(link="logit")) > summary(model1) Call: glm(formula = smoke ~ parity + age + weight + ed + ded + factor(racer) + factor(dracer), family = binomial(link = "logit"), data = birthwt) Deviance Residuals: Min 1Q Median 3Q Max -1.8788-1.0013-0.8369 1.2470 2.1988 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 1.703709 0.576463 2.955 0.00312 ** parity -0.193996 0.172715-1.123 0.26135 age -0.019631 0.013278-1.479 0.13927 weight -0.006964 0.003595-1.937 0.05271. ed -0.170832 0.060724-2.813 0.00490 ** ded -0.025189 0.054637-0.461 0.64478 factor(racer)6-2.743519 1.076358-2.549 0.01081 * factor(racer)7-1.257298 0.687650-1.828 0.06749. factor(racer)8-0.012439 0.816905-0.015 0.98785 factor(racer)9-3.405379 1.139065-2.990 0.00279 ** factor(racer)10-0.302936 0.864045-0.351 0.72589 factor(dracer)6 1.815196 1.041998 1.742 0.08150. factor(dracer)7 1.079468 0.692548 1.559 0.11907 factor(dracer)8-1.009959 0.858052-1.177 0.23918 factor(dracer)9 1.010475 0.594531 1.700 0.08920. factor(dracer)10-0.525445 0.970320-0.542 0.58815 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1

(Dispersion parameter for binomial family taken to be 1) Null deviance: 1273.2 on 953 degrees of freedom Residual deviance: 1219.1 on 938 degrees of freedom AIC: 1251.1 Number of Fisher Scoring iterations: 5 Using Genetic Algorithm: > m.out.gen=matchit(smoke~parity + age + weight + ed + ded + factor(racer) + factor(dracer), data=birthwt, method="genetic", replace=false) *NOTE: using replace=false ensures 1-1 matching without replacement. > summary(m.out.gen) Call: matchit(formula = smoke ~ parity + age + weight + ed + ded + factor(racer) + factor(dracer), data = birthwt, method = "genetic", replace = FALSE) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance 0.419 0.367 0.113 0.052 0.046 0.053 0.173 parity 0.233 0.253 0.435-0.020 0.000 0.019 1.000 age 26.875 27.691 5.785-0.815 1.000 0.805 3.000 weight 127.008 129.007 20.472-1.999 2.000 2.138 15.000 ed 2.780 3.176 1.448-0.396 0.000 0.409 2.000 ded 3.051 3.320 1.617-0.268 0.000 0.322 2.000 factor(racer)1 0.780 0.703 0.458 0.078 0.000 0.079 1.000 factor(racer)6 0.016 0.034 0.182-0.018 0.000 0.016 1.000 factor(racer)7 0.173 0.179 0.384-0.006 0.000 0.005 1.000 factor(racer)8 0.019 0.044 0.206-0.025 0.000 0.024 1.000 factor(racer)9 0.003 0.027 0.163-0.025 0.000 0.024 1.000 factor(racer)10 0.008 0.012 0.109-0.004 0.000 0.003 1.000 factor(dracer)6 0.024 0.027 0.163-0.003 0.000 0.003 1.000 factor(dracer)7 0.176 0.181 0.386-0.005 0.000 0.005 1.000 factor(dracer)8 0.016 0.043 0.202-0.026 0.000 0.027 1.000 factor(dracer)9 0.024 0.027 0.163-0.003 0.000 0.003 1.000 factor(dracer)10 0.005 0.012 0.109-0.007 0.000 0.005 1.000 Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance 0.419 0.410 0.092 0.009 0.007 0.010 0.101 parity 0.233 0.228 0.420 0.005 0.000 0.005 1.000 age 26.875 26.919 5.410-0.043 0.000 0.184 2.000 weight 127.008 127.832 18.388-0.824 2.000 1.870 15.000 ed 2.780 2.802 1.370-0.022 0.000 0.119 2.000 ded 3.051 3.051 1.560 0.000 0.000 0.070 2.000 factor(racer)1 0.780 0.762 0.427 0.019 0.000 0.019 1.000 factor(racer)6 0.016 0.022 0.146-0.005 0.000 0.005 1.000 factor(racer)7 0.173 0.184 0.388-0.011 0.000 0.011 1.000 factor(racer)8 0.019 0.019 0.137 0.000 0.000 0.000 0.000 factor(racer)9 0.003 0.003 0.052 0.000 0.000 0.000 0.000 factor(racer)10 0.008 0.011 0.104-0.003 0.000 0.003 1.000 factor(dracer)6 0.024 0.024 0.154 0.000 0.000 0.000 0.000 factor(dracer)7 0.176 0.176 0.381 0.000 0.000 0.000 0.000 factor(dracer)8 0.016 0.014 0.116 0.003 0.000 0.003 1.000 factor(dracer)9 0.024 0.024 0.154 0.000 0.000 0.000 0.000 factor(dracer)10 0.005 0.005 0.074 0.000 0.000 0.000 0.000 Percent Balance Improvement:

Mean Diff. eqq Med eqq Mean eqq Max distance 83.63 84.8 80.58 41.72 parity 72.80 0.0 71.43 0.00 age 94.68 100.0 77.10 33.33 weight 58.78 0.0 12.55 0.00 ed 94.52 0.0 70.86 0.00 ded 100.00 0.0 78.15 0.00 factor(racer)1 75.66 0.0 75.86 0.00 factor(racer)6 69.77 0.0 66.67 0.00 factor(racer)7-79.31 0.0-100.00 0.00 factor(racer)8 100.00 0.0 100.00 100.00 factor(racer)9 100.00 0.0 100.00 100.00 factor(racer)10 29.35 0.0 0.00 0.00 factor(dracer)6 100.00 0.0 100.00 100.00 factor(dracer)7 100.00 0.0 100.00 100.00 factor(dracer)8 89.76 0.0 90.00 0.00 factor(dracer)9 100.00 0.0 100.00 100.00 factor(dracer)10 100.00 0.0 100.00 100.00 Sample sizes: Control Treated All 585 369 Matched 369 369 Unmatched 216 0 Discarded 0 0 Balance after matching with the genetic algorithm is clearly better than before. Due to the strict algorithm almost perfect balance is achieved. Also note that all treated units (369) were matched to a control unit and no treated units were unmatched) > t.group=match.data(m.out.gen, group="treat") > c.group=match.data(m.out.gen, group="control") > t.group=merge(t.group, m.out.gen$match.matrix, by="row.names") > a.genmatched=merge(t.group, c.group, by.x="1", by.y="row.names") > attach(a.genmatched) > gen.bwt=cbind(bwtt.y,bwtt.x) > granova.ds(gen.bwt/16, ptcex = c(.7,1), colors =c(1, 2, 1, 1, 2, "green3"), main="infant Birthweight Smokers vs. Non-Smokers", xlab="non- Smokers", ylab="smokers") Summary Stats n 369.000 mean(x) 7.776 mean(y) 7.175 mean(d=x-y) 0.601 SD(D) 1.391 ES(D) 0.432 r(x,y) 0.108 r(x+y,d) -0.031 LL 95%CI 0.458 UL 95%CI 0.743 t(d-bar) 8.298 df.t 368.000 pval.t 0.000

It is clear using the matching method that there is a significant effect of smoking on birthweight after adjustments based on covariates. The confidence interval does NOT span zero (.485,.743) and the t-statistic is relatively large (8.3). Mothers who did not smoke during pregnancy had babies with higher birthweights compared to mothers who did smoke during pregnancy, after accounting for the selected covariate differences. These results are consistent with the circ.psa graphic (below) which summarizes outcomes from a propensity score analysis, based on strata from Dr. Pruzek s analysis (using the same model), including the confidence interval, t-statistic, and mean difference between groups.

References

1. Sekhon, J., (2007) Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R, Journal of Statistical Software 10(2):1-51. 2. Ho D., Imai K., King G., Stuart E., (2007) Matchit: Nonparametric Preprocessing for Parametric Causal Inference, Journal of Statistical Software, http://gking.harvard.edu/matchit/. 3. Diamond A., Sekhon J., (2008) Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving balance in Observational Studies, http://74.125.113.132/search?q=cache:ek0-2ungplsj:sekhon.berkeley.edu/papers/genmatch.pdf+genetic+matching&cd=1&hl=en&ct=clnk&gl=us 4. Iacus S., King G., Porro G., (2009) Causal Inference Without Balance Checking: Coarsened Exact Matching, http://gking.harvard.edu/files/cem-plus.pdf. 5. Robert M. Pruzek and James E. Helmreich (2008). granova: Graphical Analysis of Variance. R package version 1.2. 6. LaLonde, Robert J., (1986) Evaluating the Econometric Evaluations of Training Programs with Experimental Data, http://www.jstor.org/pss/1806062 Additional References and links:

For information about the LaLonde experimental data study http://www.jstor.org/pss/1806062 For MatchIt Gary King s website http://gking.harvard.edu/ http://gking.harvard.edu/matchit/ http://gking.harvard.edu/matchit/docs/matchit.pdf http://gking.harvard.edu/matchit/docs/examples.html http://gking.harvard.edu/matchit/docs/a_user_s_guide.html For those interested in matching via SAS: http://www2.sas.com/proceedings/sugi26/p214-26.pdf http://www2.sas.com/proceedings/sugi29/165-29.pdf For genetic matching: http://sekhon.berkeley.edu/papers/genmatch.pdf http://sekhon.berkeley.edu/papers/matchingjss.pdf For CEM: http://gking.harvard.edu/cem/ http://gking.harvard.edu/files/cem-plus.pdf http://www.jstatsoft.org/v30/i09/paper For Mahalanobis matching: http://www.stat.lsa.umich.edu/~bbh/optmatch/doc/mahalanobismatching.pdf http://www.lexjansen.com/pharmasug/2006/publichealthresearch/pr05.pdf