Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Similar documents
Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Covariate Balancing Propensity Score for General Treatment Regimes

Achieving Optimal Covariate Balance Under General Treatment Regimes

Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments

Discussion of Papers on the Extensions of Propensity Score

Selection on Observables: Propensity Score Matching.

Instrumental Variables

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Instrumental Variables in Action: Sometimes You get What You Need

High Dimensional Propensity Score Estimation via Covariate Balancing

What s New in Econometrics. Lecture 1

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

The propensity score with continuous treatments

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Propensity Score Methods for Causal Inference

Marginal, crude and conditional odds ratios

Causal Inference Lecture Notes: Selection Bias in Observational Studies

The Problem of Causality in the Analysis of Educational Choices and Labor Market Outcomes Slides for Lectures

Flexible Estimation of Treatment Effect Parameters

Robustness to Parametric Assumptions in Missing Data Models

Sensitivity checks for the local average treatment effect

Gov 2002: 4. Observational Studies and Confounding

AGEC 661 Note Fourteen

Propensity Score for Causal Inference of Multiple and Multivalued Treatments

Fractional Imputation in Survey Sampling: A Comparative Review

Measuring Social Influence Without Bias

Estimating the Marginal Odds Ratio in Observational Studies

THE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B.

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Causal modelling in Medical Research

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

150C Causal Inference

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Quantitative Economics for the Evaluation of the European Policy

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

On the Use of Linear Fixed Effects Regression Models for Causal Inference

Propensity Score Matching

Combining multiple observational data sources to estimate causal eects

Dynamics in Social Networks and Causality

New Developments in Nonresponse Adjustment Methods

Section 10: Inverse propensity score weighting (IPSW)

Instrumental Variables

Can a Pseudo Panel be a Substitute for a Genuine Panel?

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Instrumental Variables

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Introduction to Econometrics

A Guide to Modern Econometric:

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

The problem of causality in microeconometrics.

CompSci Understanding Data: Theory and Applications

Propensity Score Analysis with Hierarchical Data

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Imbens, Lecture Notes 2, Local Average Treatment Effects, IEN, Miami, Oct 10 1

Table B1. Full Sample Results OLS/Probit

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

IV Estimation WS 2014/15 SS Alexander Spermann. IV Estimation

Difference-in-Differences Methods

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Job Training Partnership Act (JTPA)

Regression Discontinuity Designs

TriMatch: An R Package for Propensity Score Matching of Non-binary Treatments

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

A Course in Applied Econometrics. Lecture 5. Instrumental Variables with Treatment Effect. Heterogeneity: Local Average Treatment Effects.

Chapter 2: Describing Contingency Tables - I

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Methods to Estimate Causal Effects Theory and Applications. Prof. Dr. Sascha O. Becker U Stirling, Ifo, CESifo and IZA

Combining Difference-in-difference and Matching for Panel Data Analysis

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Propensity Score Methods, Models and Adjustment

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

2 Describing Contingency Tables

Empirical Methods in Applied Microeconomics

arxiv: v1 [stat.me] 15 May 2011

Matching to estimate the causal effects from multiple treatments

Bayesian Causal Forests

Causal Inference Basics

Balancing Covariates via Propensity Score Weighting

Composite Causal Effects for. Time-Varying Treatments and Time-Varying Outcomes

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

University of Michigan School of Public Health

ECON Introductory Econometrics. Lecture 17: Experiments

Assessing Studies Based on Multiple Regression

Applied Microeconometrics I

Observational Studies and Propensity Scores

Summary and discussion of The central role of the propensity score in observational studies for causal effects

A Measure of Robustness to Misspecification

Propensity Score Weighting with Multilevel Data

Noncompliance in Randomized Experiments

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

Experiments and Quasi-Experiments

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Chapter 11. Regression with a Binary Dependent Variable

Transcription:

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke Imai Department of Politics, Princeton University 1

Outline 1. Review the propensity score of Rosenbaum and Rubin (1983) for causal inferences in observational studies. 2. Introduce the Propensity Function, a generalization of the propensity score: applicable beyond binary treatment regimes. 3. Establish the key theorem: Strong Ignorability of Treatment Assignment Given Propensity Function. 4. Monte Carlo experiments: comparison with standard regression approaches. 5. Examples: Effect of smoking on medical expenditure (bivariate continuous treatment). Effect of education on wages (ordinal instrumental variable). 2

Salk s Polio Vaccine Trials Background What is polio? When was the epidemic? How did it end? How would people respond to the vaccine before it was proven effective? Goal: Compare polio rates between 1. Children given the vaccine (treatment group) 2. Children not given the vaccine (control group) National Foundation of Infantile Paralysis 1954 Polio Vaccine Trials Treatment Group: Children whose parents give consent Control Group: Children whose parents do not give consent What problems are there with this design? 3

The Results The randomized controlled double-blinded experiment The NFIP study Size Rate a Size Rate Treatment 200,000 28 Vaccine (consent) 225,000 25 Control 200,000 71 No Consent 350,000 46 No Vaccine/consent 125,000 44 a per 100,000 The Treatment/consent and No-Treatment/No-Consent groups are comparable between the studies. Using the No-Consent group as a Control biases the causal effect. Consent (treatment indicator) is correlated with polio rates in the absence of treatment (potential outcome). 4

Causal Inference in Observational Studies 1. The general setup with a binary treatment: Treatment Potential Outcomes Covariates Group Control Treatment 1 X 1 1 Y 1 (0) Y 1 (1) 2 X 2 1 Y 2 (0) Y 2 (1) 3 X 3 0 Y 3 (0) Y 3 (1).......... 2. E.g.: National Foundation of Infantile Paralysis 1954 Polio Vaccine Trials Higher Income More likely to consent to vaccine and receive treatment Higher risk of polio The treatment is correlated with the potential outcomes. Biases results. Key: Control for the (correct) covariates. 5

Formalizing Causal Inference Causal effect: e.g., Y (t P 1 ) Y (t P 2 ). Y (t P ) is potential outcome with potential treatment, t P T. Problem: Treatment assignment, T A, is not random and is correlated with potential outcomes. Y i (T A i = t P 1 ) Y j (T A j = t P 2 ) does not correspond to causal effect. Key assumption (Strong Ignorability of Treatment Assign.; Rubin, 1978): p{t A X} = p{t A Y (t P ), X} t P T. Under this assumption, we must adjust for the covariates in our analysis 6

Adjusting for Covariates in Causal Inference Standard regression approach: Y (t P ) N(α + Xβ + t P γ, σ 2 ). common assumptions, e.g. linearity, are often violated. leads to biased causal inferences. Matching and Subclassification reduce bias more effectively: non/semi-parametric methods. diagnostics directly related to causal inferences. but, difficult when the dimensionality of X is large. 7

Propensity Score of Rosenbaum and Rubin (1983) If the treatment is binary, the propensity score, e(x) = Pr(T A = 1 X), fully characterizes p(t A X). Key Theorems: 1. The propensity score is a balancing score: Pr{T A = 1 X, e(x)} = Pr{T A = 1 e(x)}. 2. Strong Ignorability of Treatment Assignment Given e(x): E{Y (t P ) e(x)} = E{Y (t P ) T A = t P, e(x)} for t P = 0, 1. Unbiased estimate of causal effect is possible conditional on e(x). Match or subclassify on e(x), a scalar variable. 8

The Power of Conditioning on Propensity Scores The problem with (non-randomized) observational studies: Treatment assignment is correlated with potential outcomes. E.g., subjects who are more likely to respond well without treatment are more likely to be in the control group. The power of propensity scores: In a subclass with the same value of the the propensity score, Treatment assignment is UNcorrelated with potential outcomes. We can classify subjects based on their propensity score, and analyze the data separately in each class. 9

Use of the Propensity Score in Observational Studies The propensity score, e(x), is unknown in observational studies. Estimate e(x) using a statistical model, e.g., logistic regression. Advantages: 1. Diagnostics: check balance of X between treatment and control groups after matching or subclassifying on e(x). p{x T A = 0, e(x)} = p{x T A = 1, e(x)} 2. Robust to misspecification of functional forms for estimating propensity score (Drake, 1993; Dehejia and Wahba, 1999). Disadvantage: vulnerable to unobserved confounders. 10

General Treatment Regimes Propensity score is confined to binary treatment scenarios. Researchers have no control over treatment in observational studies. Continuous treatment: dose response function. Ordinal treatment: effects of years in school on income. Event-count, duration, semi-continuous, etc.... Multivariate treatments Goal: Generalization of propensity score to non-binary treatment. Existing literature: 1. ordinal treatment (Joffe & Rosenbaum, 1999), 2. categorical treatment (Imbens, 2000). Our method encompasses both and other types of treatments. 11

Framework of Causal Inference via Potential Outcomes Set of potential outcomes: Y = {Y (t P ), t P T }. Assumptions: 1. Stable Unit Treatment Value (Rubin, 1990): p{y i (t P i ) t P j, Tj A, X i } = p{y i (t P i ) X i } i j and t P i, t P j T. 2. Strongly Ignorable Treatment Assignment (Rubin 1978): p{t A Y (t P ), X} = p(t A X) t P T. Quantity of primary interest: p{y (t P )} = p{y (t P ) X} p(x) dx. 12

Propensity Function: Generalization of Propensity Score Definition: Conditional density function of the actual treatment given observed covariates, e( X) p ψ (T A X). Uniquely Parameterized Propensity Function Assumption: e( X) depends on X only through θ ψ (X). θ = θ ψ (X) uniquely represents e{ θ ψ (X)}. Examples: θ fully characterizes p(t A X) and is typically of low-dimension. 1. Continuous treatment: T A X N(X β, σ 2 ). ψ = (β, σ 2 ) and θ ψ (X) = X β. 2. Categorical treatment: Multinomial probit model for Pr(T A X). ψ = (β, Σ) and θ ψ (X) = X β. 3. Ordinal treatment: Ordinal logistic model for Pr(T A X). ψ = β and θ ψ (X) = X β. 13

Theoretical Properties of Propensity Function Theorem 1: Propensity Function is a Balancing Score. p(t A X) = p{t A e( X), X} = p{t A e( X)}. Theorem 2: Strongly Ignorable Treatment Assignment Given Propensity Function. p{y (t P ) T A, e( X)} = p{y (t P ) e( X)} t P T. Estimation of causal effects via subclassification: p{y (t P )} = p{y (t P ) T A = t P, θ} p(θ) d θ J p{y (t P ) T A = t P, θ j } W j. j=1 14

Monte Carlo Experiment 1: Continuous Treatment Number of simulations: 5,000. Sample size per simulation: n =1,000. Observed covariates: Treatment variable: Outcome variable: X 1 indep. N(1, 1), X 2 indep. N(2, 1). T A X 1, X 2 indep. N ( 1 + X 1 X 2 + X 2 1 + X2 2, 1). Y T A, X 1, X 2 indep. N ( 1 + T A + X 1 X 2 + X 2 1 + X 2 2, 1 ). 15

Robustness to Misspecification Two (misspecified) models for estimating the causal effects: 1. Direct regression: Y T A, X indep. N(αT A +X β, σ 2 ) where X = (1, X 1, X 2 ). 2. Propensity Function: Results: Model: T A X indep. N(X β, σ 2 ). Subclassify data into 10 blocks based on ˆθ = X ˆβ. Within each block, regress Y on T A and ˆθ. Calculate weighted average of 10 within-block estimates. Average Causal Effect of T A Bias MSE Direct Regression 0.832 0.692 Propensity Function 0.390 0.153 Reduce MSE by 75%. 16

Monte Carlo Experiment 2: Interaction of Two Treatments Number of simulations: 5,000. Sample size per simulation: n =2,000. Observed covariates: X 1 indep. N(1, 1), X 2 indep. N(2, 1). Treatment variables: T A 1 T A X indep. 1, X 2 N 2 1 + X2 1 + X2 2 2 1 + X 1 X 2, 1 0.5 0.5 1. Outcome variable: Y indep. N ( 1 + T A 1 + T A 2 + T A 1 T A 2 + X 1 X 2 + X 2 1 + X2 2, 1). 17

Robustness to Misspecification Two (misspecified) models for estimating the causal effects: 1. Direct Linear Regression: Regress Y on X 1, X 2, T A 1, T A 2, and, T A 1 T A 2. 2. Propensity Functions: Model Specification of Propensity Function: Independently regress both treatment variables on X 1 and X 2 Stratify on ˆθ = (X ˆβ1, X ˆβ2 ). Within Strata Model: Regress Y on T A 1, T A 2, T A 1 T A 2, and ˆθ. Direct Regression Propensity Function Bias MSE Bias MSE Effect of T A 1 0.517 0.268 0.159 0.026 Effect of T A 2 0.340 0.118 0.102 0.014 Effect of T A 1 T A 2 0.0452 0.0021 0.0074 0.0001 Reduce MSE by 90%. 18

Subclassification with Two Propensity Functions Propensity Function Propensity Function For T A 1 For T A 1 lower 1/3 middle 1/3 upper 1/3 lower 1/3 Block I Block II Block III middle 1/3 Block IV Block V Block VI upper 1/3 Block VII Block VIII Block IX 19

Example 1: Effects of Smoking on Medical Expenditure Data: 9,708 smokers from 1987 National Medical Expenditure Survey (Johnson et al., 2002). Treatment variable, T A = log(packyear): continuous measure of cumulative exposure to smoking: packyear = number of cigarettes per day 20 number of years smoked. previous studies apply propensity score to compare smokers with non-smokers (Larsen, 1999; Rubin, 2001). alternative strategy: frequency and duration of smoking as bivariate treatment variable. Outcome variable: self-reported annual medical expenditure. Covariates: age at the times of the survey, age when the individual started smoking, gender, race, marriage status, education level, census region, poverty status, and seat belt usage. 20

Model Specification of the Propensity Function Model: T A X indep. N(X β, σ 2 ) and θ = X β where X includes some square terms in addition to linear terms. Balance without controlling for theta controlling for theta quantiles of t-statistics -20 0 20 40 60 quantiles of t-statistics -20 0 20 40 60-2 -1 0 1 2 standard normal quantiles -2-1 0 1 2 standard normal quantiles First panel: T-statistics for predicting the covariates from the log(packyears). Second panel: Same, but controlling for the linear predictor. The Balance is Improved The balance serves as a model diagnostic for the propensity function. 21

Estimated Effects of Smoking Within each block, use a Two-Part model for the semi-continuous response: 1. Logistic regression for Pr(Y > 0 T A, ˆθ j ). 2. Gaussian linear regression for p{log(y ) Y > 0, T A, ˆθ j ). Propensity Function Direct Models 3 blocks 10 blocks Logistic Linear Regression Model coefficient for T A 0.097 0.060 0.065 standard error 3.074 3.031 3.074 Gaussian Linear Regression Model average causal effect 0.026 0.051 0.053 standard error 0.016 0.017 0.018 22

A Smooth Coefficient Model In our analysis we fit a separate regression in each subclass, allowing for different treatment effects in each subclass. An alternate strategy allows the treatment effect (and perhaps other regression coefficients to vary smoothly with θ. For example, in stage two log(y ) Normal(α(θ) + β(θ)t A + γx, σ 2 ), where α and β are smooth functions of θ. 23

The Smooth Coefficient Model Fit Causal effect 0.1 0.0 0.1 0.2 0.3 Subclass I II III 1 2 3 4 Propensity function The propensity score is the linear predictor for log(packyears). Important covariates include age and age when began smoking. For older people the effect of smoking on medical expenses is greater. 24

Analysis with Bivariate Treatment Now we consider the bivariate treatment Analysis T1 A = log(average number of cigarettes per day) T2 A = log(number of years smoked) Model Specification of Propensity Function: two independent regressions with same covariates as before (including squared terms). Stratify on ˆθ = (X ˆβ1, X ˆβ2 ). Within Strata Model: two-part model. Results: T A 1 T A 2 Logistic Linear Regression Model -0.358 (se=7.110) 0.075 (se=4.527) Gaussian Linear Regression Model 0.084 (se=0.026) 0.011 (se=0.036) 25

Fitted effect of Smoking Frequency with 95% error bars 0 20 40 60 80 100 cigerettes per day 26 medical cost inflation factor 1.0 1.2 1.4 1.6 1.8

Fitted effect by subclass Propensity Function Propensity Function for Duration for Frequency lower third middle third upper third Subclass I Subclass II Subclass III upper third dur: 0.317 (0.221) dur: 0.075 (0.092) dur: 0.016 (0.078) freq: -0.223 (0.143) freq: 0.125 (0.075) freq: 0.093 (0.067) n = 324 n = 1160 n = 1542 Subclass IV Subclass V Subclass VI middle third dur: 0.020 (0.105) dur: -0.011 (0.092) dur: -0.182 (0.100) freq: 0.009 (0.075) freq: 0.123 (0.076) freq: 0.208 (0.080) n = 1162 n = 910 n = 952 Subclass VII Subclass VIII Subclass XI lower third dur: -0.079 (0.099) dur: -0.178 (0.096) dur: 0.018 (0.138) freq: 0.105 (0.058) freq: 0.016 (0.072) freq: 0.026 (0.106) n = 1538 n = 954 n = 532 27

Propensity function for frequency The Smooth Coefficient Model Fit 1.5 2.0 2.5 3.0 3.5 4.0 Propensity function for duration 1.5 2.0 2.5 3.0 Causal effect of frequency 1.5 2.0 2.5 3.0 3.5 4.0 Propensity function for duration Propensity function for frequency 1.5 2.0 2.5 3.0 Causal effect of duration 28

Example 2: Effects of Education on Wages Data: 16,193 men born between 1949 and 1953 from U.S. Current Population Surveys used in Angrist and Krueger (1995). Treatment variable: highest grade completed (0 18). Outcome variable: annual wage in 1978 dollars. Covariates: race, year of birth, marital status, veteran status, Vietnam draft lottery code, region of residence, and indicator variables for residence in a central city and employment in a Standard Metropolitan Statistical Area. Following Angrist and Krueger (1995), we exclude men who did not work and/or recorded zero earning as well as those who have missing values for at least one variable. This yields the sample size of 13,900 for our analysis. 29

Propensity Function Method for Education Data 1. Model specification for propensity function: Ordinal logistic model. ˆθ = X ˆβ completely identifies propensity function. 2. Stratify data based on the scalar ˆθ. 3. Within strata model: Gaussian linear regression, p(y T A, ˆθ). But what about the ignorability assumption? Are T A and Y (t P ) independent given X? Intellectually gifted and motivated individuals Likely to be better educated Likely to be paid better for any level of education Covariates do not measure native intelligence or work ethic. 30

An Instrumental Variables Analysis An instrumental variable: is independent of the potential treatments and the potential outcomes given the covariates (i.e., ignorability), does not affect the potential outcome given the treatment and the covariates, is monotonically predictive of the treatment. (These assumptions are required for causal interpretation, see Angrist, Imbens and Rubin, JASA, 1996.) The Vietnam Draft Lottery: Instrument Treatment Outcome Lottery Code Veteran Status Education Level Income 31

Balancing the Covariates Across the Instrument The Lottery Code is recorded as an 14-category ordinal variable. We use a ordinal logistic model for the propensity function The model is characterized by the scalar linear predictor, ˆθ = X ˆβ. Balance without controlling for theta controlling for theta quantiles of t-statistics -1 0 1 quantiles of t-statistics -1 0 1-2 -1 0 1 2 standard normal quantiles -2-1 0 1 2 standard normal quantiles First panel: T-statistics for predicting the covariates from the instrument Second panel: Same, but controlling for the linear predictor. The Balance is Improved 32

Estimated Effects of Education The average effect of a one year increase in the highest grade on log weekly wage: Direct Models Propensity Function TSLS SSIV 5 blocks 10 blocks average causal effect 0.109 0.040 0.062 0.063 standard error 0.034 0.037 0.015 0.010 (SSIV=Split-Sample Instrumental Variables, Angrist and Krueger (1995)) 33

Non-Constant Treatment Effects The treatment effect appears to vary with the covariates, as measured by ˆθ: Block 1 Block 2 Block 3 Block 4 Block 5 Overall estimate 0.0839 0.0628 0.0200 0.0541 0.0896 0.0621 std. error 0.0278 0.0348 0.0284 0.0357 0.0358 0.0146 Unfortunately, the heterogeneity in treatment effect is difficult to make sense of because ˆθ is not easily interpretable. Current Research aims at using propensity functions to identify the linear combination of the covariates that best predicts compliance with the randomized treatment. We expect this linear combination to be predictive of the magnitude of the intention to treat causal effect. This strategy uses propensity functions to (partially) identify heterogeneity of treatment effects in IV analyses rather than to balance the instrument, and, provides a more interpretable causal effect than TSLS when the treatment is not binary. (See Angrist and Imbens, JASA, 1995). 34

Angrist and Imbens Result Under the above IV assumptions, suppose Z is a binary instrumental variable, T 1 and T 2 are the two potential treatments taking values in {0, 1,...,J}, Y 1,...,Y J are the potential outcomes, and there are no covariates. Angrist and Imbens showed that a population version of the TSLS estimate, where E(Y Z = 1) E(Y Z = 0) J E(T Z = 1) E(T Z = 0) = ω j = j=1 ω j E(Y j Y j 1 T 1 j > T 0 ), (1) Pr(T 1 j > T 0 ) J i=1 Pr(T 1 i > T 0 ). Eqn. (1) is a weighted average of per unit change in treatment effects. Unfortunately, the average is over overlapping subpopulations. (E.g., individuals with T 1 = J and T 0 = 0 are members of all the subpopulations.) 35

Concluding Remarks Our generalization widens the range of potential applications of propensity score methods. Subclassification on propensity function allows flexible modeling strategies. Propensity function methods retain attractive features of propensity score methods: 1. Low dimensional summary of high dimensional covariates. 2. Useful diagnostics: balance of covariates. 3. Relatively robust to misspecification. 4. Reduces bias and mean squared error relative to linear models. 36