Background of Matching
|
|
- Monica Tate
- 5 years ago
- Views:
Transcription
1 Background of Matching Matching is a method that has gained increased popularity for the assessment of causal inference. This method has often been used in the field of medicine, economics, political science, sociology, law, and of course, statistics. 1 Matching works for experimental data, but is usually used for observational studies where the treatment variable is not randomly assigned by the investigator, or the random assignment goes awry. 2 There are a wide variety of matching procedures and there has been no consensus on how matching ought to be done and how to measure the success of the matching method. 1 When using matching methods to estimate causal effects, a common issue is deciding how best to perform the matching. Two common approaches are propensity score matching and multivariate matching based on the Mahalanobis distance. Matching methods based on the propensity score, Mahalanobis distance, or a combination of the two, have appealing theoretical properties if covariates have ellipsoidal distributions, e.g. distributions such as the normal or t. If the covariates are so distributed, these methods have the property of equal percent bias reduction (EPBR). When this property holds, matching will reduce bias in all linear combinations of the covariates. 3 If the EPBR property does not hold, then, in general, matching will increase the bias of some linear functions of the covariates even if all univariate means are closer in the matched data than the unmatched. 3 Unfortunately, the EPBR property rarely holds. 3 Rubin Causal Model A causal effect is the difference between an observed outcome and its counterfactual. The Rubin causal model conceptualizes causal inference in terms of potential outcomes under treatment and control, only one of which is observed for each unit. Let Y i1 denote the potential outcome for unit i if the unit receives treatment, and let Y i0 denote the potential outcome for unit i in the control regime. The treatment effect for observation i is defined by t i = Y i1 Y i0. Causal inference is a missing data problem because Y i1 and Y i0 are never both observed. Let T i be a treatment indicator equal to 1 when i is in the treatment regime and 0 otherwise. The observed outcome for observation i is then Y i = T i Y i1 + (1 T i )Y i0. In principle, if assignment to treatment is randomized, causal inference is straightforward because the two groups are drawn from the same population by construction, and treatment assignment is independent of all baseline variables. As the sample size grows, observed and unobserved baseline variables are balanced across treatment and control groups with arbitrarily high probability, because treatment assignment is independent of Y 0 and Y 1, ie {Y i0, Y i1 T i }. For j=0,1, E(Y ij T i = 1) = E(Y ij T i = 0) = E(Y i T i = j) Therefore, the average treatment effect (ATE) can be estimated by: t = E(Y i1 T i = 1) E(Y i0 T i = 0) = E(Y i T i = 1) E(Y i T i = 0) In an observational setting, covariates are almost never balanced across treatment and control groups because the two groups are not ordinarily drawn from the same population. Thus, a common quantity of interest is the average treatment effect for the treated (ATT): t (T = 1) = E(Y i1 T i = 1) E(Y i0 T i = 1). However, this cannot be directly estimated since Y i0 is not observed for the treated. Progress can be made by assuming that selection into treatment depends on observable covariates X. Following Rosenbaum and Rubin (1983), one can assume that conditional on X, treatment assignment is unconfounded ({Y 0, Y 1 T} X) and
2 that there is overlap: 0 < Pr(T = 1 X) < 1. Together, unconfoundedness and overlap constitute a property known as strong ignorability of treatment assignment which is necessary for identifying the average treatment effect. Heckman, Ichimura, Smith, and Todd (1998) show that for ATT, the unconfoundedness assumption can be weakened to mean independence: E (Y ij T i,x i ) = E (Y ij X i ). The overlap assumption for ATT only requires that the support of X for the treated be a subset of the support of X for control observations. Then, following Rubin (1974, 1977) we obtain E(Y ij X i, T i = 1) = E(Y ij X i, T i = 0) = E(Y i X i, T i = j). By conditioning on observed covariates, Xi, treatment and control groups are exchangeable. The average treatment effect for the treated is estimated as t (T = 1) = E {E(Y i X i, T i = 1) E(Y i X i, T i = 0) T i = 1}, where the outer expectation is taken over the distribution of X i (T i = 1) which is the distribution of baseline variables in the treated group. The most straightforward and nonparametric way to condition on X is to exactly match on the covariates. This approach fails in finite samples if the dimensionality of X is large or if X contains continuous covariates. Thus, in general, alternative methods must be used. Mahalanobis Distance Matching The most common and conventional method for matching (without propensity score) is the Mahalanobis distance. Mahalanobis distance is the distance between two N dimensional points scaled by the statistical variation in each component of the point. For example, if X i and X j are two points from the same distribution with covariance matrix C, then the Mahalanobis distance can be expressed as D(X i,x j ) = {(X i -X j ) t C -1 (X i -X j )} 1/2 When the covariance matrix is the identity matrix, Mahalanobis distance specializes to the Euclidean distance. Mahalanobis Distance Matching was used as one method of matching observations based on Mahalanobis distance for bias reduction in observational studies. Propensity Score Matching An alternative way to condition on X is to match on the probability of assignment to treatment given a vector of covariates, known as the propensity score. 1 As one sample grows large, matching on the propensity score produces balance on the vector of covariates X, 1 meaning, as your sample size increases the distribution of covariates between treated and control groups will become balanced when matched on propensity score. This method involves matching each treated unit to the nearest control unit on the unidimentional metric of the propensity score vector. 1 If the propensity score is estimated by logistic regression matching should be done on the linear predictor μ=xβ. 1 By matching on the linear predictor compression of propensity scores near zero and one is avoided. 1 Often, the linear predictor is more nearly normally distributed which is of importance given the EPBR results if the propensity score is matched on along with other covariates. 1 It has been noted to be useful to combine the propensity score matching with Mahalanobis distance. This has shown to be effective because propensity score matching is particularly good at minimizing the discrepancy
3 along the propensity score and Mahalanobis distance is particularly good at minimizing the distance between individual coordinates of X. 1 There are several methods of propensity score matching, including exact matching, nearest neighbor matching, optimal matching, full matching, genetic matching, and coarsened exact matching. Exact Matching Method The simplest way to obtain good matches is to use one-to-one exact matching. 2 This pairs each treated unit with one control unit for which the values of X i are identical, meaning, that each treated unit is matched to all possible control units with exactly the same values on all the covariates. 2 This method will form subclasses and within each subclass all units have the same values. 2 However, the implementation of this method can be rather difficult. With Many covariates and finite numbers of potential matches, sufficient exact matches often cannot be found. 2 Nearest Neighbor Matching Method The nearest neighbor matching method selects the r best control matches for each individual in the treatment group. 2 Matching is done using a distance measure specified by logistic regression. 2 Matches are chosen for each treated unit one at a time from largest to smallest. 2 At each matching step we choose the control unit that is not yet matched, but is closest to the treated unit on the distance measure (propensity score). 2 The nearest neighbor method is greedy matching, where the closest control match for each treated unit is chosen one at a time, without trying to minimize a global distance measure. 2 Optimal Matching Method In contrast to greedy matching optimal matching method finds the matched samples with the smallest average distance across all matched pairs. 2 Optimal matching is felt to do a better job of minimizing the distance within each pair as well as be helpful when there are not as many appropriate control matches for the treated units. 2 Full Matching Method Full matching is a type of subclassification that forms the subclasses in an optimal way. 2 A fully matched sample is composed of matched sets, where each matched set contains one treated unit and one or more controls. 2 Full matching is optimal in terms of minimizing a weighted average of the estimated distance measure between each treated subject and each control subject within each subclass. 2 Genetic Matching Method The idea behind this matching method is to use a genetic search algorithm to find a set of weights for each covariate so that a version of optimal balance is achieved after matching has been completed. 2 The algorithm in genetic matching maximizes the balance of observed baseline covariates across matched treated and control units. 4 If a matching method is not EPBR, then that method will, in general, increase the bias for some linear function of the covariates even if all univariate means are closer in the matched data than the unmatched. 4 Genetic matching is shown to have better properties than the usual alternative matching methods both when the EPBR property holds and when it does not. Even when the EPBR property holds and the mapping from X to Y is linear, genetic matching has better efficiency i.e., lower mean squared error (MSE) in finite samples. When the EPBR property does not hold, as it generally does not, genetic matching retains appealing properties and the differences in performance between genetic matching and the other matching methods can become substantial both in terms of bias and MSE reduction. In short, at the expense of computer time, genetic matching dominates the other matching methods in terms of MSE when assumptions required for EPBR hold and, even more so, when they do not. 4
4 Coarsened Exact Matching (CEM) The CEM is a Monotonic Imbalance Bounding (MIB) matching method, meaning, that the balance between treated and control groups is chosen by the user before rather than discovered through the usual process of checking after the fact and repeatedly re-estimating. 2 The result is that the arduous process of balance checking, tweaking, and repeatedly rerunning the matching procedure is therefore eliminated, as is the uncertainty about whether the matching procedure will improve balance at all. You get what you want rather than getting what you get. 4 The basic idea of CEM is to coarsen each variable by recoding so that substantively indistinguishable values are grouped and assigned the same numerical value (groups may be the same size or different sizes, depending on the substance of the problem). Then the exact matching algorithm is applied to the coarsened data to determine the matches and to prune unmatched units. Finally, the coarsened data are discarded and the original (uncoarsened) values of the matched data are retained. 4 MatchIt Function for R MatchIt is designed for causal inference with a dichotomous treatment variable and a set of pretreatment control variables. 2 The MatchIt function may be used in R to conduct the previously mentioned matching methods. The main command matchit () implements the matching procedure. A general syntax includes: > m.out <- matchit(treat ~ x1 + x2, data = mydata) where treat is the dichotomous treatment variable, and x1 + x2 are pre-treatment covariates, all of which are contained in the data frame mydata. Further examples will be shown using a subset of the job training program analyzed in Lalonde (1986). Mahalanobis matching is implemented in MatchIt using distance= mahalanobis using the syntax: >m.out = matchit(treat ~ educ + black + hispan + age, data=lalonde, mahvars=c("age","educ"), distance= Mahalanobis, replace=false) The mahvars option: variables on which to perform Mahalanobis-metric matching within each caliper (default = NULL). Variables should be entered as a vector of variable names e.g., mahvars = c("x1", "X2")). If mahvars is specified without caliper, the caliper is set to Exact matching is implemented in MatchIt using method = exact using the syntax: > m.out = matchit (treat ~ educ + black + hispan, data = lalonde, method = exact ) The default nearest neighbor matching method in MatchIt is greedy matching and is implemented using method = nearest and the syntax: > m.out = matchit (treat ~ re74 + re75 + educ + black + hispan +age, data = lalonde, method = nearest ) Optimal matching is performed with MatchIt by setting method = optimal, which automatically loads an add-on package called optmatch. A 2:1 optimal ratio match is conducted based on the propensity score from the logistic regression and the appropriate syntax for optimal matching is: > m.out = matchit (treat ~ re74 + re75 + age + educ + black + hispan, data = lalonde, method = optimal, ratio = 2)
5 Full matching can be performed with MatchIt by setting method = full. As with optimal matching the optmatch package is utilized. The appropriate syntax is: > m.out = matchit (treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, method = full Genetic matching can be performed with MatchIt by setting method = genetic, which automatically loads the Matching package. The appropriate syntax is: > m.out = matchit (treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, method = genetic CEM can be performed with MatchIt by setting method = cem, which automatically loads the cem package. The appropriate syntax is: > m.out = matchit (treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, method = cem Checking Balance Using MatchIt The goal of matching is to create a data set that looks closer to one that would result from a perfectly blocked (and possibly randomized) experiment. When we get close, we break the link between the treatment variable and the pretreatment controls, which makes the parametric form of the analysis model less relevant or irrelevant entirely. To break this link, we need the distribution of covariates to be the same within the matched treated and control groups. 2 A crucial part of any matching procedure is, therefore, to assess how close the (empirical) covariate distributions are in the two groups, which is known as balance. Because the outcome variable is not used in the matching procedure, any number of matching methods can be tried and evaluated, and the one matching procedure that leads to the best balance can be chosen. MatchIt provides a number of ways to assess the balance of covariates after matching, including numerical summaries such as the mean Diff. (difference in means) or the difference in means divided by the treated group standard deviation, and summaries based on quantilequantile (Q-Q) plots that compare the empirical distributions of each covariate. The widely used procedure of doing t-tests of the difference in means is highly misleading and should never be used to assess balance. These balance diagnostics should be performed on all variables in X, even if some are excluded from one of the matching procedures. 2 The summary() command The summary() command gives measures of the balance between the treated and control groups in the full (original) data set, and then in the matched data set. If the matching worked well, the measures of balance should be smaller in the matched data set (smaller values of the measures indicate better balance). The summary() output for subclassification is the same as that for other types of matching, except that the balance statistics are shown separately for each subclass, and the overall balance in the matched samples is calculated by aggregating across the subclasses, where each subclass is weighted by the number of units in the
6 subclass. For exact matching, the covariate values within each subclass are guaranteed to be the same, and so the measures of balance are not output for exact matching; only the sample sizes in each subclass are shown. The summary() command provides means, the original control group standard deviation (where applicable), mean differences, standardized mean differences, and (median, mean and maximum) Quantile- Quantile (Q-Q) plot differences. In addition, the summary() command will report the matched call, how many units were matched, unmatched, or discarded due to the discard option (described below), and the percent improvement in balance for each of the balance measures, defined as 100(( a b )/ a ), where a is the balance before and b is the balance after matching. For each set of units (original and matched data sets, with weights used as appropriate in the matched data sets), the following statistics are provided: Means Treated and Means Control show the weighted means in the treated and control groups SD Control is the standard deviation calculated in the control group (where applicable) Mean Diff is the difference in means between the groups The final three columns of the summary output give summary statistics of a Q-Q plot. Those columns give the median, mean, and maximum distance between the two empirical quantile functions (treated and control groups). Values greater than 0 indicate deviations between the groups in some part of the empirical distributions. The plots of the two empirical quantile functions themselves, described below, can provide further insight into which part of the covariate distribution has differences between the two groups. Additional options: Three options to the summary() command can also help with assessing balance and respecifying the propensity score model, as necessary. First, the interactions = TRUE option with summary() shows the balance of all squares and interactions of the covariates used in the matching procedure. Large differences in higher order interactions usually are a good indication that the propensity score model (the distance measure) needs to be respecified. Similarly, the addlvariables option with summary() will provide balance measures on additional variables not included in the original matching procedure. If a variable (or interaction of variables) not included in the original propensity score model has large imbalances in the matched groups, including that variable in the next model specification may improve the resulting balance on that variable. Because the outcome variable is not used in the matching procedure, a variety of matching methods can be tried, and the one that leads to the best resulting balance chosen. Finally, the standardize = TRUE option will print out standardized versions of the balance measures, where the mean difference is standardized (divided) by the standard deviation in the original treated group. The plot() command We can also examine the balance graphically using the plot() command, which provides three types of plots: jitter plots of the distance measure, Q-Q plots of each covariate, and histograms of the distance measure. For subclassification, separate Q-Q plots can be printed for each subclass. The jitter plot for subclassification is the same as that for other types of matching, with the addition of vertical lines indicating the subclass cutpoints. With the histogram option, 4 histograms are provided: the original treated and control groups and the matched treated and control groups. For the Q-Q plots and the histograms, the weights that result after matching are used to create the plots. *NOTE: There are many other options that go with all of the MatchIt methods. They can be found starting at section in the MatchIt documentation found at
7 Some Examples Lalonde data The National Supported Work Demonstration (NSW) was a temporary employment program designed to help disadvantaged workers lacking basic job skills move into the labor market by giving them work experience and counseling in a sheltered environment. Unlike other federally sponsored employment and training programs, the NSW program assigned qualified applicants to training positions randomly. Those assigned to the treatment group received all the benefits of the NSW program, while those assigned to the control group were left to fend for themselves. 6 A data frame with 614 observations (185 treated, 429 control). There are 10 variables measured for each individual. "treat" is the treatment assignment (1=treated, 0=control), from a field experiment where individuals were randomly assigned to participate in a job training program. "age" is age in years. "educ" is education in number of years of schooling. "black" is an indicator for African-American (1=African-American, 0=not). "hispan" is an indicator for being of Hispanic origin (1=Hispanic, 0=not). "married" is an indicator for married (1=married, 0=not married). "nodegree" is an indicator for whether the individual has a high school degree (1=no degree, 0=degree). "re74" is income in 1974, in U.S. dollars. "re75" is income in 1975, in U.S. dollars. "re78" is income in 1978, in U.S. dollars. The outcome of interest is real earnings in (1) Propensity score matching: When insufficient exact matches can be found, as this becomes increasingly common as the number of covariates increase, we need to find a way to identify matches that are close.'' In this situation, matching on the estimated propensity score is a useful alternative. The propensity score is the probability that a unit receives treatment, given the covariates. To conduct propensity score matching, with pre-treatment covariates composed of age, years of education, high school degree, real earnings in 1974 and 1975: *a first look before any transformations are done to the income variables > data(lalonde) > attach(lalonde) > m.out=matchit(treat ~ age + educ + nodegree + re74 + re75, data=lalonde) > summary(m.out) Call: matchit(formula = treat ~ age + educ + nodegree + re74 + re75, data = lalonde) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance age educ nodegree re74 re Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance age educ nodegree re re Percent Balance Improvement: Mean Diff. eqq Med eqq Mean eqq Max
8 distance age educ nodegree re re Sample sizes: Control Treated All Matched Unmatched Discarded 0 0 This reveals simple statistics of the propensity score and the covariates used in the propensity score specification for the full and matched samples, including t-statistics and balance bias statistics used to assess whether there was a reduction in bias in the covariates. We see that 185 control units were matched to the 185 treated units (a 1-1'' match). The average propensity scores in the matched treated and control groups are much more similar than in the original groups, with both groups having propensity score means of roughly 0.36 in the matched samples. All six variables (propensity score, age, education, degree, 1974 income, and 1975 income) had reductions in bias due to the matching. For example, job training participants on average earned roughly $3,523 less in 1974 and $934 less in 1975 than non-participants. In the matched sample, the earnings difference is only $122 in 1974 and $103 in This one-to-one matching algorithm has thus chosen 185 control individuals who do look very similar to the treated group on the covariates used in the matching process. The summary command will additionally report (a) the original call of the MatchIt object, (b) whether there are any ``Problematic covariates'' that may still be imbalanced in the assignment model, and (c) how many units were discarded due to the discard option (described below). In this case there were no units discarded and no ``problematic covariates.'' In order to perform t-tests and plot difference scores from the resulting matching, we wish to create a new data frame (i.e. table) that combines subjects that are paired from the two groups to a single row or data point. First, we ll create two new data frames for the two groups. > t.group=match.data(m.out, group="treat") > c.group=match.data(m.out, group="control") The MatchIt routine will return a matrix with two columns that map the treatment group to the control group (m.out$match.matrix), this will be merged to the t.group data frame. > t.group=merge(t.group, m.out$match.matrix, by="row.names") Finally, we can now merge the two treatment groups into a single data frame. > a.matched=merge(t.group, c.group, by.x="1", by.y="row.names") > attach(a.matched) Now we wish to compare earnings for 1978 for the two groups > aa78=cbind(re78.x,re78.y) > granova.ds(aa78,ptcex = c(.7,1), colors =c(1, 2, 1, 1, 2, "green3"), main="real Earnings 1975 Treatment vs. Control Groups")
9 Summary Stats n mean(x) mean(y) mean(d=x-y) SD(D) ES(D) r(x,y) r(x+y,d) LL 95%CI UL 95%CI t(d-bar) df.t pval.t Looking at the distributions of the income variables it is clear that they are strongly positively skewed. In this case taking the log transformation of the income variables would be appropriate. **After log transformations of income variables:
10 > re7475.log=log(re74+re75+1) > re78.log=log(re78+1) > data=cbind(lalonde, re7475.log, re78.log,re74.log,re75.log) > m.out=matchit(treat~age + educ + nodegree + re7475.log, data=data) > summary(m.out) Call: matchit(formula = treat ~ age + educ + nodegree + re7475.log, data = data) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance age educ nodegree re7475.log Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance age educ nodegree re7475.log Percent Balance Improvement: Mean Diff. eqq Med eqq Mean eqq Max distance age educ nodegree re7475.log Sample sizes: Control Treated All Matched Unmatched Discarded 0 0 > t.group=match.data(m.out, group="treat") > c.group=match.data(m.out, group="control") > t.group=merge(t.group, m.out$match.matrix, by="row.names") > a.matched=merge(t.group, c.group, by.x="1", by.y="row.names") > attach(a.matched) > aa78=cbind(re78.log.x,re78.log.y) > granova.ds(aa78, ptcex = c(.7,1), colors =c(1, 2, 1, 1, 2, "green3"), main="real Earnings 1978 Treatment vs. Control Groups") Summary Stats n mean(x) mean(y) mean(d=x-y) SD(D) ES(D) r(x,y) 0.011
11 r(x+y,d) LL 95%CI UL 95%CI t(d-bar) df.t pval.t After taking the log transformation the t-statistic increased from.43 to.59. However due to 0 values for the re78 income variable, which is clearly evident on the graphic, it is still hard to see an effect. Balance still remains high between treatment and control groups though with mean propensity scores differing by after matching compared to.194 before matching. Next we will remove those with 0 values for the re78 income variable: Before taking out the 0 s we had Sample sizes: Control Treated After taking out 0 s we have Sample sizes: Control Treated
12 All All So we lost about 1/3 of the controls and ¼ of the treated. To remove those with a 0 value I simply selected those without a 0 value: > x=lalonde[lalonde[,"re78"]!=0,] Re-running the granova.ds function on the data without 0 values for the re78 income variable we get: > attach(x) > lalonde.data=cbind(x, re7475.log, re78.log,re74.log,re75.log) > m.out=matchit(treat~age + educ + nodegree + re7475.log, data=lalonde.data) > summary(m.out) Call: matchit(formula = treat ~ age + educ + nodegr + re7475.log, data = lalonde.data) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance age educ nodegr re7475.log Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance age educ nodegr re7475.log Percent Balance Improvement: Mean Diff. eqq Med eqq Mean eqq Max distance age educ nodegr re7475.log Sample sizes: Control Treated All Matched Unmatched 28 0 Discarded 0 0 Once again matching has improved balance for treatment and control groups. > granova.ds(aa78, ptcex = c(.7,1), colors =c(1, 2, 1, 1, 2, "green3"), main="real Earnings 1978 Treatment vs. Control Groups") Summary Stats n mean(x) mean(y) 8.550
13 mean(d=x-y) SD(D) ES(D) r(x,y) r(x+y,d) LL 95%CI UL 95%CI t(d-bar) df.t pval.t While the graphic still does not show significant differences between treatment and control groups it is a clearer picture now that the 0 s are removed. This graphic shows the necessity of the log transformation to get a better understanding of what is going on. It becomes clear for a small subset of the 140 pairs that the treatment does show evidence of having worked better than the control. Next: Mahalanobis Matching
14 Let s say you did not want to match on propensity score. In fact, propensity score matching was not always around. There were other methods such as Mahalnobis Distance Matching which is based on correlations between variables by which different patterns can be identified and analyzed. Using the same dataset as before, with the log transformations of all income variables and eliminating those with a 0 value for the re78 income variable we get: > m.out2=matchit(treat ~ age + educ + nodegree + re7475.log, data=lalonde.data, mahvars=c("age","educ","nodegree","re7475.log"), caliper=.25, replace=false, distance="mahalanobis") > summary(m.out2) Call: matchit(formula = treat ~ age + educ + nodegree + re7475.log, data = lalonde.data, distance = "mahalanobis", mahvars = c("age", "educ", "nodegree", "re7475.log"), caliper = 0.25, replace = FALSE) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance age educ nodegree re7475.log Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance age educ nodegree re7475.log Percent Balance Improvement: Mean Diff. eqq Med eqq Mean eqq Max distance age educ nodegree re7475.log Sample sizes: Control Treated All Matched Unmatched Discarded 0 0 Matching on the Mahalanobis distance produced similar results to matching on propensity score. You can achieve better or worse matching by playing with the caliper and variables on which to perform Mahalanobis-metric matching within each caliper. >t.group=match.data(m.out2, group="treat") >c.group=match.data(m.out2, group="control") >t.group=merge(t.group, m.out2$match.matrix, by="row.names")
15 >a.matched2=merge(t.group, c.group, by.x="1", by.y="row.names") >attach(a.matched2) >aa78.2=cbind(re78.log.x,re78.log.y) > granova.ds(aa78.2, ptcex = c(.7,1), colors =c(1, 2, 1, 1, 2, "green3"), main="real Earnings 1978 Treatment vs. Control Groups") Summary Stats n mean(x) mean(y) mean(d=x-y) SD(D) ES(D) r(x,y) r(x+y,d) LL 95%CI UL 95%CI t(d-bar) df.t pval.t Again it is clear that the confidence interval spans zero indicating no significant different in treatment vs. control groups. However, it is again evident that the treatment does show evidence of having worked better than the control for a subset of people, particularly those to the far right of the graphic. Lastly, Genetic Matching
16 Genetic matching automates the process of finding a good matching solution. The idea is to use a genetic search algorithm to find a set of weights for each covariate such that a version of optimal balance is achieved after matching. Due to the complex nature of the algorithm, this type of matching methods can take particularly long to run, especially with large datasets consisting of numerous covariates. > m.out3=matchit(treat ~ age + educ + nodegree + re7475.log, data=lalonde.data, method="genetic",replace=false) *NOTE: using replace=false ensures 1-1 matching without replacement. > summary(m.out3) Call: matchit(formula = treat ~ age + educ + nodegr + re7475.log, data = lalonde.data, method = "genetic", replace = FALSE) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance age educ nodegr re7475.log Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance age educ nodegr re7475.log Percent Balance Improvement: Mean Diff. eqq Med eqq Mean eqq Max distance age educ nodegr re7475.log Sample sizes: Control Treated All Matched Unmatched 28 0 Discarded 0 0 Balance is almost perfect between the two groups after running the genetic algorithm. > granova.ds(aa78.3, ptcex = c(.7,1), colors =c(1, 2, 1, 1, 2, "green3"), main="real Earnings 1978 Treatment vs. Control Groups") Summary Stats n mean(x) mean(y) mean(d=x-y) SD(D) 1.512
17 ES(D) r(x,y) r(x+y,d) LL 95%CI UL 95%CI t(d-bar) df.t pval.t Again, there is no significant overall treatment effect. Although, you can still see that there are a select few pairs that do show that the treatment shows evidence of having worked better than the control. Conclusion: There are numerous matching methods out there. Whether you chose the simplest method or a complex algorithmic method, you could likely end up with the same results, as we have seen here. If your interest is mainly balance then your specific choices of matching methods could display dissimilar results. But if in the end your goal is to examine treatment effects it is probable that your results will be the same if you chose one
18 method over another. It is suggested that you explore your research question at hand and determine which matching method suits your analytic needs. ******************************************************end LaLonde analyses Mini Analysis of the Berkeley Birthweight Data Question: How do babies born to smokers compare with respect to birth weights to babies born to non-smokers? After considering interactions such as ed*race*weight and age*weight and suggested by the rpart diagram, they proved to have z-values close to 0 and were removed from the model. Model chosen is : > model1=glm(smoke~parity + age + weight + ed + ded + factor(racer) + factor(dracer), data=birthwt, family=binomial(link="logit")) > summary(model1) Call: glm(formula = smoke ~ parity + age + weight + ed + ded + factor(racer) + factor(dracer), family = binomial(link = "logit"), data = birthwt) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) ** parity age weight ed ** ded factor(racer) * factor(racer) factor(racer) factor(racer) ** factor(racer) factor(dracer) factor(dracer) factor(dracer) factor(dracer) factor(dracer) Signif. codes: 0 *** ** 0.01 *
19 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 953 degrees of freedom Residual deviance: on 938 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 Using Genetic Algorithm: > m.out.gen=matchit(smoke~parity + age + weight + ed + ded + factor(racer) + factor(dracer), data=birthwt, method="genetic", replace=false) *NOTE: using replace=false ensures 1-1 matching without replacement. > summary(m.out.gen) Call: matchit(formula = smoke ~ parity + age + weight + ed + ded + factor(racer) + factor(dracer), data = birthwt, method = "genetic", replace = FALSE) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance parity age weight ed ded factor(racer) factor(racer) factor(racer) factor(racer) factor(racer) factor(racer) factor(dracer) factor(dracer) factor(dracer) factor(dracer) factor(dracer) Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eqq Med eqq Mean eqq Max distance parity age weight ed ded factor(racer) factor(racer) factor(racer) factor(racer) factor(racer) factor(racer) factor(dracer) factor(dracer) factor(dracer) factor(dracer) factor(dracer) Percent Balance Improvement:
20 Mean Diff. eqq Med eqq Mean eqq Max distance parity age weight ed ded factor(racer) factor(racer) factor(racer) factor(racer) factor(racer) factor(racer) factor(dracer) factor(dracer) factor(dracer) factor(dracer) factor(dracer) Sample sizes: Control Treated All Matched Unmatched Discarded 0 0 Balance after matching with the genetic algorithm is clearly better than before. Due to the strict algorithm almost perfect balance is achieved. Also note that all treated units (369) were matched to a control unit and no treated units were unmatched) > t.group=match.data(m.out.gen, group="treat") > c.group=match.data(m.out.gen, group="control") > t.group=merge(t.group, m.out.gen$match.matrix, by="row.names") > a.genmatched=merge(t.group, c.group, by.x="1", by.y="row.names") > attach(a.genmatched) > gen.bwt=cbind(bwtt.y,bwtt.x) > granova.ds(gen.bwt/16, ptcex = c(.7,1), colors =c(1, 2, 1, 1, 2, "green3"), main="infant Birthweight Smokers vs. Non-Smokers", xlab="non- Smokers", ylab="smokers") Summary Stats n mean(x) mean(y) mean(d=x-y) SD(D) ES(D) r(x,y) r(x+y,d) LL 95%CI UL 95%CI t(d-bar) df.t pval.t 0.000
21 It is clear using the matching method that there is a significant effect of smoking on birthweight after adjustments based on covariates. The confidence interval does NOT span zero (.485,.743) and the t-statistic is relatively large (8.3). Mothers who did not smoke during pregnancy had babies with higher birthweights compared to mothers who did smoke during pregnancy, after accounting for the selected covariate differences. These results are consistent with the circ.psa graphic (below) which summarizes outcomes from a propensity score analysis, based on strata from Dr. Pruzek s analysis (using the same model), including the confidence interval, t-statistic, and mean difference between groups.
22 References
23 1. Sekhon, J., (2007) Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R, Journal of Statistical Software 10(2): Ho D., Imai K., King G., Stuart E., (2007) Matchit: Nonparametric Preprocessing for Parametric Causal Inference, Journal of Statistical Software, 3. Diamond A., Sekhon J., (2008) Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving balance in Observational Studies, 4. Iacus S., King G., Porro G., (2009) Causal Inference Without Balance Checking: Coarsened Exact Matching, 5. Robert M. Pruzek and James E. Helmreich (2008). granova: Graphical Analysis of Variance. R package version LaLonde, Robert J., (1986) Evaluating the Econometric Evaluations of Training Programs with Experimental Data, Additional References and links:
24 For information about the LaLonde experimental data study For MatchIt Gary King s website For those interested in matching via SAS: For genetic matching: For CEM: For Mahalanobis matching:
Lab 4, modified 2/25/11; see also Rogosa R-session
Lab 4, modified 2/25/11; see also Rogosa R-session Stat 209 Lab: Matched Sets in R Lab prepared by Karen Kapur. 1 Motivation 1. Suppose we are trying to measure the effect of a treatment variable on the
More informationPropensity Score Matching
Methods James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Methods 1 Introduction 2 3 4 Introduction Why Match? 5 Definition Methods and In
More informationGov 2002: 5. Matching
Gov 2002: 5. Matching Matthew Blackwell October 1, 2015 Where are we? Where are we going? Discussed randomized experiments, started talking about observational data. Last week: no unmeasured confounders
More informationMatching. Stephen Pettigrew. April 15, Stephen Pettigrew Matching April 15, / 67
Matching Stephen Pettigrew April 15, 2015 Stephen Pettigrew Matching April 15, 2015 1 / 67 Outline 1 Logistics 2 Basics of matching 3 Balance Metrics 4 Matching in R 5 The sample size-imbalance frontier
More informationMatching for Causal Inference Without Balance Checking
Matching for ausal Inference Without Balance hecking Gary King Institute for Quantitative Social Science Harvard University joint work with Stefano M. Iacus (Univ. of Milan) and Giuseppe Porro (Univ. of
More informationPROPENSITY SCORE MATCHING. Walter Leite
PROPENSITY SCORE MATCHING Walter Leite 1 EXAMPLE Question: Does having a job that provides or subsidizes child care increate the length that working mothers breastfeed their children? Treatment: Working
More informationSelection on Observables: Propensity Score Matching.
Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017
More informationAn Introduction to Causal Analysis on Observational Data using Propensity Scores
An Introduction to Causal Analysis on Observational Data using Propensity Scores Margie Rosenberg*, PhD, FSA Brian Hartman**, PhD, ASA Shannon Lane* *University of Wisconsin Madison **University of Connecticut
More informationDynamics in Social Networks and Causality
Web Science & Technologies University of Koblenz Landau, Germany Dynamics in Social Networks and Causality JProf. Dr. University Koblenz Landau GESIS Leibniz Institute for the Social Sciences Last Time:
More informationAchieving Optimal Covariate Balance Under General Treatment Regimes
Achieving Under General Treatment Regimes Marc Ratkovic Princeton University May 24, 2012 Motivation For many questions of interest in the social sciences, experiments are not possible Possible bias in
More informationMatching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14
STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University Frequency 0 2 4 6 8 Quiz 2 Histogram of Quiz2 10 12 14 16 18 20 Quiz2
More informationUse of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:
Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 13, 2009 Kosuke Imai (Princeton University) Matching
More informationstudies, situations (like an experiment) in which a group of units is exposed to a
1. Introduction An important problem of causal inference is how to estimate treatment effects in observational studies, situations (like an experiment) in which a group of units is exposed to a well-defined
More informationDOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS. A SAS macro to estimate Average Treatment Effects with Matching Estimators
DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS A SAS macro to estimate Average Treatment Effects with Matching Estimators Nicolas Moreau 1 http://cemoi.univ-reunion.fr/publications/ Centre d'economie
More informationCausal Inference with General Treatment Regimes: Generalizing the Propensity Score
Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke
More informationESTIMATION OF TREATMENT EFFECTS VIA MATCHING
ESTIMATION OF TREATMENT EFFECTS VIA MATCHING AAEC 56 INSTRUCTOR: KLAUS MOELTNER Textbooks: R scripts: Wooldridge (00), Ch.; Greene (0), Ch.9; Angrist and Pischke (00), Ch. 3 mod5s3 General Approach The
More informationCausal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies
Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed
More informationPropensity Score Matching and Genetic Matching : Monte Carlo Results
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS060) p.5391 Propensity Score Matching and Genetic Matching : Monte Carlo Results Donzé, Laurent University of Fribourg
More informationImplementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston
Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University Stata User Group Meeting, Boston July 26th, 2006 General Motivation Estimation of average effect
More informationGov 2002: 4. Observational Studies and Confounding
Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last two weeks: randomized experiments. From here on: observational studies. What
More informationPropensity Score Matching and Variations on the Balancing Test
Propensity Score Matching and Variations on the Balancing Test Wang-Sheng Lee* Melbourne Institute of Applied Economic and Social Research The University of Melbourne First version: November 3, 2005 This
More informationPropensity Score Methods for Causal Inference
John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good
More informationImplementing Matching Estimators for. Average Treatment Effects in STATA
Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University West Coast Stata Users Group meeting, Los Angeles October 26th, 2007 General Motivation Estimation
More informationAlternative Balance Metrics for Bias Reduction in. Matching Methods for Causal Inference
Alternative Balance Metrics for Bias Reduction in Matching Methods for Causal Inference Jasjeet S. Sekhon Version: 1.2 (00:38) I thank Alberto Abadie, Jake Bowers, Henry Brady, Alexis Diamond, Jens Hainmueller,
More informationWhat s New in Econometrics. Lecture 1
What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and
More informationWeighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.
Weighting Unconfounded Homework 2 Describe imbalance direction matters STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University
More informationMatching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region
Impact Evaluation Technical Session VI Matching Techniques Jed Friedman Manila, December 2008 Human Development Network East Asia and the Pacific Region Spanish Impact Evaluation Fund The case of random
More informationIntroduction to Propensity Score Matching: A Review and Illustration
Introduction to Propensity Score Matching: A Review and Illustration Shenyang Guo, Ph.D. School of Social Work University of North Carolina at Chapel Hill January 28, 2005 For Workshop Conducted at the
More informationSince the seminal paper by Rosenbaum and Rubin (1983b) on propensity. Propensity Score Analysis. Concepts and Issues. Chapter 1. Wei Pan Haiyan Bai
Chapter 1 Propensity Score Analysis Concepts and Issues Wei Pan Haiyan Bai Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity score analysis, research using propensity score analysis
More informationCovariate Balancing Propensity Score for General Treatment Regimes
Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian
More informationUse of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:
Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 27, 2007 Kosuke Imai (Princeton University) Matching
More informationPropensity Score Matching and Variations on the Balancing Test
Propensity Score Matching and Variations on the Balancing Test Wang-Sheng Lee* Melbourne Institute of Applied Economic and Social Research The University of Melbourne March 10, 2006 Abstract This paper
More informationImbens/Wooldridge, IRP Lecture Notes 2, August 08 1
Imbens/Wooldridge, IRP Lecture Notes 2, August 08 IRP Lectures Madison, WI, August 2008 Lecture 2, Monday, Aug 4th, 0.00-.00am Estimation of Average Treatment Effects Under Unconfoundedness, Part II. Introduction
More informationEcon 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem
Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects The Problem Analysts are frequently interested in measuring the impact of a treatment on individual behavior; e.g., the impact of job
More informationThe Balance-Sample Size Frontier in Matching Methods for Causal Inference: Supplementary Appendix
The Balance-Sample Size Frontier in Matching Methods for Causal Inference: Supplementary Appendix Gary King Christopher Lucas Richard Nielsen March 22, 2016 Abstract This is a supplementary appendix to
More informationThe Balance-Sample Size Frontier in Matching Methods for Causal Inference
The Balance-Sample Size Frontier in Matching Methods for Causal Inference Gary King Christopher Lucas Richard Nielsen May 21, 2016 Running Head: The Balance-Sample Size Frontier Keywords: Matching, Causal
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationMatching is a statistically powerful and conceptually
The Balance-Sample Size Frontier in Matching Methods for Causal Inference Gary King Christopher Lucas Richard A. Nielsen Harvard University Harvard University Massachusetts Institute of Technology Abstract:
More informationSIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS
SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS TOMMASO NANNICINI universidad carlos iii de madrid UK Stata Users Group Meeting London, September 10, 2007 CONTENT Presentation of a Stata
More informationQuantitative Economics for the Evaluation of the European Policy
Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti Davide Fiaschi Angela Parenti 1 25th of September, 2017 1 ireneb@ec.unipi.it, davide.fiaschi@unipi.it,
More informationNew Developments in Econometrics Lecture 11: Difference-in-Differences Estimation
New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?
More informationUsing R in 200D Luke Sonnet
Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random
More informationA Measure of Robustness to Misspecification
A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate
More informationGenetic Matching for Estimating Causal Effects:
Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies Alexis Diamond Jasjeet S. Sekhon Forthcoming, Review of Economics and
More informationPropensity Score Matching and Analysis TEXAS EVALUATION NETWORK INSTITUTE AUSTIN, TX NOVEMBER 9, 2018
Propensity Score Matching and Analysis TEXAS EVALUATION NETWORK INSTITUTE AUSTIN, TX NOVEMBER 9, 2018 Schedule and outline 1:00 Introduction and overview 1:15 Quasi-experimental vs. experimental designs
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationA Theory of Statistical Inference for Matching Methods in Causal Research
A Theory of Statistical Inference for Matching Methods in Causal Research Stefano M. Iacus Gary King Giuseppe Porro October 4, 2017 Abstract Researchers who generate data often optimize efficiency and
More informationTreatment Effects with Normal Disturbances in sampleselection Package
Treatment Effects with Normal Disturbances in sampleselection Package Ott Toomet University of Washington December 7, 017 1 The Problem Recent decades have seen a surge in interest for evidence-based policy-making.
More informationClinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.
Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,
More informationMATCHING FOR EE AND DR IMPACTS
MATCHING FOR EE AND DR IMPACTS Seth Wayland, Opinion Dynamics August 12, 2015 A Proposal Always use matching Non-parametric preprocessing to reduce model dependence Decrease bias and variance Better understand
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationUsing Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models
Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models James J. Heckman and Salvador Navarro The University of Chicago Review of Economics and Statistics 86(1)
More informationJob Training Partnership Act (JTPA)
Causal inference Part I.b: randomized experiments, matching and regression (this lecture starts with other slides on randomized experiments) Frank Venmans Example of a randomized experiment: Job Training
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More informationA Theory of Statistical Inference for Matching Methods in Applied Causal Research
A Theory of Statistical Inference for Matching Methods in Applied Causal Research Stefano M. Iacus Gary King Giuseppe Porro April 16, 2015 Abstract Matching methods for causal inference have become a popular
More informationA Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II
A Course in Applied Econometrics Lecture Outline Estimation of Average Treatment Effects Under Unconfoundedness, Part II. Assessing Unconfoundedness (not testable). Overlap. Illustration based on Lalonde
More informationAchieving Optimal Covariate Balance Under General Treatment Regimes
Achieving Optimal Covariate Balance Under General Treatment Regimes Marc Ratkovic September 21, 2011 Abstract Balancing covariates across treatment levels provides an effective and increasingly popular
More informationJournal of Statistical Software
JSS Journal of Statistical Software August 2013, Volume 54, Issue 7. http://www.jstatsoft.org/ ebalance: A Stata Package for Entropy Balancing Jens Hainmueller Massachusetts Institute of Technology Yiqing
More informationThe 2004 Florida Optical Voting Machine Controversy: A Causal Analysis Using Matching
The 2004 Florida Optical Voting Machine Controversy: A Causal Analysis Using Matching Jasjeet S. Sekhon 11/14/2004 (23:48) Preliminary and Incomplete, Comments Welcome This work is part of a joint project
More informationA Simulation-Based Sensitivity Analysis for Matching Estimators
A Simulation-Based Sensitivity Analysis for Matching Estimators Tommaso Nannicini Universidad Carlos III de Madrid Abstract. This article presents a Stata program (sensatt) that implements the sensitivity
More informationAge 55 (x = 1) Age < 55 (x = 0)
Logistic Regression with a Single Dichotomous Predictor EXAMPLE: Consider the data in the file CHDcsv Instead of examining the relationship between the continuous variable age and the presence or absence
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationFlexible Estimation of Treatment Effect Parameters
Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both
More informationIntroduction to the Generalized Linear Model: Logistic regression and Poisson regression
Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)
More informationfinite-sample optimal estimation and inference on average treatment effects under unconfoundedness
finite-sample optimal estimation and inference on average treatment effects under unconfoundedness Timothy Armstrong (Yale University) Michal Kolesár (Princeton University) September 2017 Introduction
More informationMultivariate Matching Methods That are Monotonic Imbalance Bounding
Multivariate Matching Methods That are Monotonic Imbalance Bounding Stefano M. Iacus Gary King Giuseppe Porro April, 00 Abstract We introduce a new Monotonic Imbalance Bounding (MIB) class of matching
More informationMeasuring Social Influence Without Bias
Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from
More informationSection 9c. Propensity scores. Controlling for bias & confounding in observational studies
Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i
More informationII. MATCHMAKER, MATCHMAKER
II. MATCHMAKER, MATCHMAKER Josh Angrist MIT 14.387 Fall 2014 Agenda Matching. What could be simpler? We look for causal effects by comparing treatment and control within subgroups where everything... or
More informationECON Introductory Econometrics. Lecture 17: Experiments
ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.
More informationThe propensity score with continuous treatments
7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.
More informationLogistic Regression 21/05
Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression
More informationNISS. Technical Report Number 167 June 2007
NISS Estimation of Propensity Scores Using Generalized Additive Models Mi-Ja Woo, Jerome Reiter and Alan F. Karr Technical Report Number 167 June 2007 National Institute of Statistical Sciences 19 T. W.
More informationMoving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand
Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand Richard K. Crump V. Joseph Hotz Guido W. Imbens Oscar Mitnik First Draft: July 2004
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationPrinciples Underlying Evaluation Estimators
The Principles Underlying Evaluation Estimators James J. University of Chicago Econ 350, Winter 2019 The Basic Principles Underlying the Identification of the Main Econometric Evaluation Estimators Two
More informationESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics
ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. The Sharp RD Design 3.
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationTriMatch: An R Package for Propensity Score Matching of Non-binary Treatments
TriMatch: An R Package for Propensity Score Matching of Non-binary Treatments Jason M. Bryer Excelsior College May 3, 013 Abstract The use of propensity score methods (Rosenbaum and Rubin, 1983) have become
More informationAPPENDIX 1 BASIC STATISTICS. Summarizing Data
1 APPENDIX 1 Figure A1.1: Normal Distribution BASIC STATISTICS The problem that we face in financial analysis today is not having too little information but too much. Making sense of large and often contradictory
More informationLinear Modelling in Stata Session 6: Further Topics in Linear Modelling
Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical
More informationObservational Studies and Propensity Scores
Observational Studies and s STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University Makeup Class Rather than making you come
More informationCausal Sensitivity Analysis for Decision Trees
Causal Sensitivity Analysis for Decision Trees by Chengbo Li A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer
More informationCausal Inference Lecture Notes: Selection Bias in Observational Studies
Causal Inference Lecture Notes: Selection Bias in Observational Studies Kosuke Imai Department of Politics Princeton University April 7, 2008 So far, we have studied how to analyze randomized experiments.
More informationSection 10: Inverse propensity score weighting (IPSW)
Section 10: Inverse propensity score weighting (IPSW) Fall 2014 1/23 Inverse Propensity Score Weighting (IPSW) Until now we discussed matching on the P-score, a different approach is to re-weight the observations
More information(Mis)use of matching techniques
University of Warsaw 5th Polish Stata Users Meeting, Warsaw, 27th November 2017 Research financed under National Science Center, Poland grant 2015/19/B/HS4/03231 Outline Introduction and motivation 1 Introduction
More informationOUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores
OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS 776 1 15 Outcome regressions and propensity scores Outcome Regression and Propensity Scores ( 15) Outline 15.1 Outcome regression 15.2 Propensity
More informationA SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS
A SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS TOMMASO NANNICINI universidad carlos iii de madrid North American Stata Users Group Meeting Boston, July 24, 2006 CONTENT Presentation of
More informationCausal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms
Causal Mediation Analysis in R Kosuke Imai Princeton University June 18, 2009 Joint work with Luke Keele (Ohio State) Dustin Tingley and Teppei Yamamoto (Princeton) Kosuke Imai (Princeton) Causal Mediation
More informationBalancing Covariates via Propensity Score Weighting
Balancing Covariates via Propensity Score Weighting Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu Stochastic Modeling and Computational Statistics Seminar October 17, 2014
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationEmpirical approaches in public economics
Empirical approaches in public economics ECON4624 Empirical Public Economics Fall 2016 Gaute Torsvik Outline for today The canonical problem Basic concepts of causal inference Randomized experiments Non-experimental
More informationCompSci Understanding Data: Theory and Applications
CompSci 590.6 Understanding Data: Theory and Applications Lecture 17 Causality in Statistics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu Fall 2015 1 Today s Reading Rubin Journal of the American
More informationIntroduction to Structural Equation Modeling
Introduction to Structural Equation Modeling Notes Prepared by: Lisa Lix, PhD Manitoba Centre for Health Policy Topics Section I: Introduction Section II: Review of Statistical Concepts and Regression
More informationZelig: Everyone s Statistical Software
Zelig: Everyone s Statistical Software Toward A Common Framework for Statistical Analysis & Development Kosuke Imai 1 Gary King 2 Olivia Lau 3 1 Department of Politics Princeton University 2 Department
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More informationModeling Overdispersion
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in
More information