PROPENSITY SCORE MATCHING Walter Leite 1
EXAMPLE Question: Does having a job that provides or subsidizes child care increate the length that working mothers breastfeed their children? Treatment: Working for a company that provides or subsidizes child care Outcome: age of the child in weeks when breastfeeding ended Data source: National Longitudinal Survey of Youth 1979 (NLSY79) and the NLSY79 Children and Youth Sample size: Child care was provided or subsidized in 107 (8.85%) of 1209 cases. 2
ESTIMATION OF PROPENSITY SCORES FOR EXAMPLE 31 covariates were selected; Examples: benefits provided by the mother s current job (i.e., life insurance, dental insurance, profit sharing, retirement, training opportunities), the mother s education level, hours worked per week, and employment sector, family size, amount of public assistance received by the family, and whether a cesarean section was performed. Estimation was performed using logistic regression with the glm function of the base R package. 3
PROPENSITY SCORES FOR MATCHING It is advantageous to match on the linear propensity score (i.e., the logit of the propensity score) rather than the propensity score itself, because it avoids compression around zero and one. log(e(x)) log ex ( ) 1 ex ( ) 4
Common support region Cases excluded Range of matched cases. Participants Nonparticipants Predicted Probability 5
COMMON SUPPORT REQUIREMENTS Matching to estimate the ATT only requires that the distribution of propensity scores for the treated is contained within the distribution of the untreated. 6
MATCHING METHODS TAXONOMY Replacement: Matching with replacement Matching without replacement Algorithms: Greedy matching: Optimal matching Genetic matching Ratio: Pair matching (1 to 1) Fixed ratio (1 to k) variable ratio Full 7
GREEDY MATCHING For each individual in the treated sample, select the best available match without accounting for the quality of the match of the entire treated sample. Greedy matching has the advantage of allowing any analysis to estimate the treatment effect after matching. Types of Greedy Matching: Nearest neighbor propensity score matching Nearest neighbor propensity score matching within a caliper Mahalanobis metric matching Mahalanobis metric matching within a propensity score caliper 8
LIMITATIONS OF GREEDY MATCHING While trying to maximize exact matches (i.e., within the common support region or within a caliper), cases may be excluded due to incomplete matching (no available matches). While trying to maximize cases (i.e., widen the region), inexact matching may result. To prevent inexact or incomplete matching when the number of untreated is not much larger than the treated, use matching with replacement. 9
NEAREST NEIGHBOR MATCHING 1. Randomly order the treated and untreated individuals 2. select the first treated individual i and find the untreated individual j with closest propensity score. 3. If matching is without replacement, remove j from the pool. 4. Repeat the above process until matches are found for all participants. 10
NEAREST NEIGHBOR WITHIN A CALIPER Caliper: a required common-support region for each match, usually in standard deviation units. Rosembaum and Rubin (1985) suggest a caliper of.25 standard deviations. Matching Method: Within the caliper of each treated observation, select the untreated observation with closest propensity score. 11
MAHALANOBIS DISTANCE 1 di (, j) ( uv) C ( uv) T u and v are values of the matching variables for participant i and nonparticipant j, C is the sample covariance matrix of the matching variables from the full set of nonparticipants; 12
MAHALANOBIS METRIC MATCHING 1. Randomly order sample 2. Calculate the Mahalanobis distance between the first treated individual and all untreated individuals based on all covariates; 3. Choose the untreated individual, j, with the minimum distance d(i,j) as the match for treated individual i; 4. If matching is without replacement, remove j from the pool. 5. Repeat the above process until matches are found for all participants. 13
MAHALANOBIS METRIC MATCHING WITHIN PROPENSITY SCORE CALIPER 1. Randomly order sample 2. Calculate the Mahalanobis distance between the first treated individual and all untreated individuals within an propensity score caliper based on all covariates except the propensity score; 3. Choose the untreated individual, j, with the minimum distance d(i,j) as the match for treated individual i; 4. If matching is without replacement, remove j from the pool. 5. Repeat the above process until matches are found for all participants. 14
RESULTS OF SIMULATION STUDIES COMPARING DISTANCE MEASURES Gu and Rosenbaum (1993): Propensity score performed better than Mahalanobis distance and Mahalanobis within propensity calipers when there were many covariates for use in the study. Zhao (2004): Propensity score matching performed better than Mahalanobis metric matching in conditions with high correlations between covariates and the treatment participation indicator. Propensity score matching did not work well when the sample size used in the simulation was small. 15
OPTIMAL MATCHING Optimal matching is a network flow optimization problem that can be solved by linear programming methods. Matching is performed to minimize total weighted sample distance (which is the minimum cost in the network). Optimal matching will perform as well or better than greedy matching with respect to minimum distance and balance. 16
SIMULATION STUDIES COMPARING MATCHING ALGORITHMS Gu and Rosenbaum (1993): Optimal matching outperformed greedy matching except when there were a large number of control units available for matching to treated units. Optimal matching performed slightly better than greedy matching when comparing both matching methods based on the propensity distance. Cepeda, Boston, Farrar and Strom(2003): Optimal matching produced a larger reduction in bias when using a variable number of control units compared to using a fixed number of control units. 17
NUMBER OF MATCHES: PAIR MATCHING For ATT: Each treated is matched to a single control. For ATE: Each treated is matched to a single control and each control to a single treated. 18
NUMBER OF MATCHES: 1 TO K For ATT: Each treated is matched to K controls. For ATE: Each treated is matched to K controls and each control to K treated. 19
NUMBER OF MATCHES: VARIABLE RATIO OR FULL Variable ratio matching where each treated is matched to many controls. Full matching were each treated is matched to many controls and each control is matched to many treated. Full matching can be considered a form of stratification to a maximum number of strata containing at least one treated and one untreated. 20
SIMULATION STUDIES COMPARING NUMBER OF MATCHES. Gu & Rosenbaum (1993): Full matching performed better than 1 to k matching in terms of distance within matched sets as well as producing greater balance, especially when the number of covariates was large. Cepeda, Boston, Farrar and Strom(2003): Overall, optimal matching with variable number of controls removes more bias than with a fixed number of controls. Rosembaum (1989) points out that optimal pair matching is not actually optimal, but optimal full matching is. 21
SIMULATION RESULTS ABOUT THE EFFECT OF THE NUMBER OF AVAILABLE CONTROLS Gu & Rosenbaum: When there were many controls available for every one that will be used in pair matching, there was little difference between optimal and greedy matching When there is only one control available for matching to each treated unit, optimal matching is noticeably better than greedy matching. Cepeda, Boston, Farrar and Strom(2003): Both optimal matching with a fixed and variable number of control units produced identical reduction in bias when the treated to control ratio was 1/5, but optimal matching with a variable number of controls performed better with 1/2, 1/3 and 1/4 ratios. Regardless of method, reduction of bias with optimal matching decreases as the number of available controls decreases. 22
MATCHING WITH A GENETIC ALGORITHM Minimizes a multivariate weighted distance where weights are chosen to maximize a measure of covariate balance (e.g., standardized mean difference). Weight Matrix. If weights are 1, d is the Mahalanobis distance Cholesky decomposition of the sample covariance matrix S of the matching variables, which is a lower triangular matrix L where S = LL. 23
MATCHING METHODS USED FOR THE EXAMPLE Matching Method One-to-one greedy with replacement and caliper Variable ratio greedy with replacement and caliper Variable ratio genetic with replacement (propensity score [PS] + covariates) Variable ratio genetic with replacement (PS only) One-to-one optimal without replacement Full matching 24
COVARIATE BALANCE FOR THE EXAMPLE Matching Method Maximum standardized difference Unbalanced Covariates One-to-one greedy with replacement and caliper 0.21 11 (26.1) Variable ratio greedy with replacement and caliper 0.30 4 (9.5) Variable ratio genetic with replacement (PS + 0.23 12 (28.6) covariates) Variable ratio genetic with replacement (PS only) 0.13 8 (19.0) One-to-one optimal without replacement 0.28 11 (26.1) Full matching 0.26 4 (9.5) 25
SHOULD MATCHED DATA BE TREATED AS RELATED SAMPLES? Schafer and Kang (2008) and Stuart (2010): matched samples should be treated as independent data because matching does not produce correlations between outcomes of matched individuals. Austin (2011): because the covariates which have similar distributions for matched and treated groups are related to outcomes, the distributions of outcomes will be more similar for treated and matched samples than from randomly selected samples. 26
ABADIE AND IMBENS SIMPLE MATCHING ESTIMATOR n 1 ATE Yˆ Yˆ n 1 0 i 1 0 ATT Y i i Yi i n1 1 ˆ ˆ n1 i T ˆ1 i 1 Y M i Y j J i M if Z 1 () i Y j i if Z 0 i ˆ 0 i 1 Y M i Y i j J () i M if Z 0 Y i j if Z 1 i 27
TREATMENT EFFECT ESTIMATES WITH VARIABLE RATIO GENETIC MATCHING USING THE MATCHING PACKAGE Matching only: Estimate... 3.7664 AI SE... 2.6266 T-stat... 1.4339 p.val... 0.1516 Matching with additional bias adjustment by regressing the outcomes on covariates only with the matched data: Estimate... 4.352 AI SE... 2.7694 T-stat... 1.5714 p.val... 0.11608 28
WEIGHTS FOR TREATMENT EFFECT ESTIMATION WITH 1 TO K, VARIABLE RATIO OR FULL MATCHING One-to-k or variable ratio: weighs are the inverse of the total number of matches unit received. Matching with replacement: weights for each untreated unit are summed across the multiple matched groups it was included in. Then, weights of the matched cases are multiplied by the ratio of the total number of matched units and total number of treated units. w i n n 0 1 n 1 if Z =1 i m 1 i 1 if Zi = 0 M m 29
ESTIMATION OF TREATMENT EFFECT WITH MATCHING Horvitz and Thompson Estimator of the treatment effect (Rosenbaum, 1987) n 1 wy i1 i1 i0 i0 i1 i1 n n 1 0 w i1 i0 i1 i1 n 0 wy w Estimation with Weighted regression: Y BZ e i 0 1 i i
TREATMENT EFFECT ESTIMATES USING FULL MATCHING WITH WEIGHTED REGRESSION Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 10.0428 0.8887 11.300 <2e-16 *** childcaretrue 3.4806 2.3346 1.491 0.136 31
ROSENBAUM S SENSITIVITY ANALYSIS Rosenbaum (2002) proposed a method based on the Wilcoxon signed ranks test The sensitivity analysis increases the level of hidden bias to obtain an upper and lower bound for how the p-value of association might be affected. By varying the assumed magnitude of hidden bias, we can find out how much hidden bias is necessary to render our p-value not significant. 32
BASIC CONCEPTS OF ROSENBAUM S METHOD If two matched individuals have the same observed covariates but different probabilities of receiving the treatment, the odds ratio of these units receiving the treatment is: j /(1 j) j(1 k) /(1 ) (1 ) k k k 1 If there is hidden bias,the odds ratio will be larger than one and smaller than a constant. 1 (1 ) j k (1 ) k 1 33
MEASURING THE DEGREE OF HIDDEN BIAS (gamma) measures the degree of departure from a study that is free of hidden bias. A study is sensitive to hidden bias if small values of lead to change in inferences. A study is insensitive to hidden bias if large values of do not lead to change in inferences. 34
STEPS OF WILCOXON S SIGNED RANK TEST FOR SENSITIVITY ANALYSIS IN A MATCHED PAIRS STUDY 1. Compute the differences between matched pairs and rank them. 2. Compute the Wilcoxon signed rank statistic for the outcome difference between treated and control. 3. Compute the expectation and variance of the Wilcoxon signed rank statistic under the null hypothesis of no treatment effect. 4. Compute the Z score for the observed signed rank statistic given the expected value and variance. For the test against the null hypothesis, the lower and upper bounds of the p value are the same. 35
STEPS OF WILCOXON S SIGNED RAND TEST FOR SENSITIVITY ANALYSIS IN A MATCHED PAIRS STUDY 5. Compute the expectation of the lower bound of the Wilcoxon signed rank statistic under the null hypothesis of = 2 (or any other value higher than 1) and its variance. 6. Compute the Z score associated with the lower bound and associated p value. 7. Compute the expectation of the upper bound of the Wilcoxon signed rank statistic under the null hypothesis of = 2 (or any other value higher than 1) and its variance. 8. Compute the Z score associated with the upper bound and associated p value. 9. Check whether range between the p values of the lower and upper bounds does not cross 0.05, which would change conclusions about the statistical significance of results. 36
RESULTS OF ROSENBAUM S SENSITIVITY ANALYSIS WITH ESTIMATES FROM GENETIC MATCHING Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value Unconfounded estimate... 0.1305 Gamma Lower bound Upper bound 1.0 0.1305 0.1305 1.2 0.0343 0.3311 1.4 0.0077 0.5560 1.6 0.0016 0.7396 1.8 0.0003 0.8613 2.0 0.0001 0.9315 2.2 0.0000 0.9681 2.4 0.0000 0.9858 2.6 0.0000 0.9939 2.8 0.0000 0.9974 3.0 0.0000 0.9990 37
R PACKAGES USEFUL FOR PROPENSITY SCORE MATCHING Package Function Objective MatchIt matchit Implement greedy matching and as interface for genetic, optimal, and full matching Matching GenMatch, Match, MatchBalance Obtain covariate weights for genetic matching, implement genetic matching with weights, as well as greedy matching rbounds psens Implement Rosenbaum s sensitivity analysis method 38