Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Similar documents
Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments

Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments

Covariate Balancing Propensity Score for General Treatment Regimes

Achieving Optimal Covariate Balance Under General Treatment Regimes

The propensity score with continuous treatments

Discussion of Papers on the Extensions of Propensity Score

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Selection on Observables: Propensity Score Matching.

Estimating the Marginal Odds Ratio in Observational Studies

Propensity Score Matching

Marginal, crude and conditional odds ratios

What s New in Econometrics. Lecture 1

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Combining multiple observational data sources to estimate causal eects

Flexible Estimation of Treatment Effect Parameters

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Propensity Score Methods for Causal Inference

Propensity Score Analysis with Hierarchical Data

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

Section 10: Inverse propensity score weighting (IPSW)

On the Use of Linear Fixed Effects Regression Models for Causal Inference

Propensity Score for Causal Inference of Multiple and Multivalued Treatments

Balancing Covariates via Propensity Score Weighting

High Dimensional Propensity Score Estimation via Covariate Balancing

Causal Inference Basics

Extending causal inferences from a randomized trial to a target population

Propensity score modelling in observational studies using dimension reduction methods

Causal modelling in Medical Research

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Propensity Score Weighting with Multilevel Data

A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Job Training Partnership Act (JTPA)

For more information about how to cite these materials visit

arxiv: v2 [stat.me] 19 Jan 2017

Matching to estimate the causal effects from multiple treatments

STAC51: Categorical data Analysis

Dynamics in Social Networks and Causality

Summary and discussion of The central role of the propensity score in observational studies for causal effects

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Test of Association between Two Ordinal Variables while Adjusting for Covariates

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Can a Pseudo Panel be a Substitute for a Genuine Panel?

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.

Variable selection and machine learning methods in causal inference

Robustness to Parametric Assumptions in Missing Data Models

Specification Errors, Measurement Errors, Confounding

TriMatch: An R Package for Propensity Score Matching of Non-binary Treatments

Applied Health Economics (for B.Sc.)

Instrumental Variables

Measuring Social Influence Without Bias

A Measure of Robustness to Misspecification

Describing Contingency tables

Causal Inference Lecture Notes: Selection Bias in Observational Studies

THE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B.

Applied Econometrics Lecture 1

Addressing Analysis Issues REGRESSION-DISCONTINUITY (RD) DESIGN

OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores

AFFINELY INVARIANT MATCHING METHODS WITH DISCRIMINANT MIXTURES OF PROPORTIONAL ELLIPSOIDALLY SYMMETRIC DISTRIBUTIONS

Introduction to Statistical Analysis

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Nuoo-Ting (Jassy) Molitor, Nicky Best, Chris Jackson and Sylvia Richardson Imperial College UK. September 30, 2008

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Matching for Causal Inference Without Balance Checking

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

DATA-ADAPTIVE VARIABLE SELECTION FOR

Introduction An approximated EM algorithm Simulation studies Discussion

Chapter 2: Describing Contingency Tables - I

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models

Chapter 11. Regression with a Binary Dependent Variable

Regression with a Binary Dependent Variable (SW Ch. 9)

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Distribution-free ROC Analysis Using Binary Regression Techniques

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Gov 2002: 4. Observational Studies and Confounding

Bayesian Causal Forests

LOGISTIC REGRESSION Joseph M. Hilbe

Table B1. Full Sample Results OLS/Probit

Robust Tree-based Causal Inference for Complex Ad Effectiveness Analysis

Empirical Methods in Applied Microeconomics

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

CompSci Understanding Data: Theory and Applications

Composite Causal Effects for. Time-Varying Treatments and Time-Varying Outcomes

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

2 Describing Contingency Tables

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

Transcription:

Causal Inference in Observational Studies with Non-Binary reatments Statistics Section, Imperial College London Joint work with Shandong Zhao and Kosuke Imai Cass Business School, October 2013

Outline 1 Causal Inference in Observational Studies Non Ignorability of reatment Assignment Propensity Scores 2 he Propensity Function he Generalized Propensity Score 3

Outline Non Ignorability of reatment Assignment Propensity Scores 1 Causal Inference in Observational Studies Non Ignorability of reatment Assignment Propensity Scores 2 he Propensity Function he Generalized Propensity Score 3

Salk s Polio Vaccine rials Non Ignorability of reatment Assignment Propensity Scores NFIP 1954 Polio Vaccine rials reatment Group: Children whose parents give consent Control Group: Children whose parents do not consent Results: he randomized controlled double-blinded experiment he NFIP study Size Rate 1 Size Rate reatment 200,000 28 Vaccine (consent) 225,000 25 Control 200,000 71 No Consent 350,000 46 No Vaccine/consent 125,000 44 a per 100,000 Rows are comparable between the studies. Using No-Consent group as Control biases results. Consent (treatment indicator) is correlated with polio rates in the absence of treatment (potential outcome).

Non Ignorability of reatment Assignment Propensity Scores Causal Inference in Observational Studies 1 he Rubin Causal Model: reatment Potential Outcomes Covariates Group Control reatment 1 X 1 1 Y 1 (0) Y 1 (1) 2 X 2 1 Y 2 (0) Y 2 (1) 3 X 3 0 Y 3 (0) Y 3 (1)..... Children with treatment more likely to contract polio in the absence of treatment. 2 E.g.: NFIP 1954 Polio Vaccine rials Higher Income More likely to consent to vaccine and receive treatment Higher risk of polio he treatment is correlated with the potential outcomes. Biases results. Key: Control for the (correct) covariates.

Formalizing Causal Inference Non Ignorability of reatment Assignment Propensity Scores Causal effect: e.g., Y (t P 1 ) Y (tp 2 ). Y (t P ) is potential outcome with potential treatment, t P. Problem: reatment,, correlated w/ potential outcomes. Y i ( i = t1 P) Y j( j = t2 P ) is not the causal effect. Key assumption (Strong Ignorability of reatment Assign.): p{ X} = p{ Y (t P ), X} t P. Under this assumption, we must adjust for the covariates in our analysis

Non Ignorability of reatment Assignment Propensity Scores Adjusting for Covariates in Causal Inference outcome 0 2 4 6 C CC C CC C C C C 1 0 1 2 3 4 covariate Standard regression: Y (t P ) N(α + Xβ + t P γ, σ 2 ). common assumptions, e.g. linearity, are often violated. leads to biased causal inferences. Matching and Subclassification reduce bias: non/semi-parametric methods. but, difficult when the dimensionality of X is large.

Non Ignorability of reatment Assignment Propensity Scores Propensity Score of Rosenbaum and Rubin (1983) If the treatment is binary, the propensity score, fully characterizes p( X). Key heorems: e(x) = Pr( = 1 X), 1 he propensity score is a balancing score: Pr{ = 1 X, e(x)} = Pr{ = 1 e(x)}. 2 Strong Ignorability of reatment Assignment Given e(x): E{Y (t P ) e(x)} = E{Y (t P ) = t P, e(x)} for t P = 0, 1. Unbiased estimate of causal effect is possible given e(x). Match or subclassify on e(x), a scalar variable.

Non Ignorability of reatment Assignment Propensity Scores he Power of Conditioning on Propensity Scores he problem with (non-randomized) observational studies: reatment assignment correlated with potential outcomes. E.g., subjects who are more likely to respond well without treatment are more likely to be in the control group. he power of propensity scores: In a subclass with the same value of the the propensity score, reatment is UNcorrelated with potential outcomes. We can classify subjects based on their propensity score, and analyze the data separately in each class.

Non Ignorability of reatment Assignment Propensity Scores Use of the Propensity Score in Observational Studies he propensity score is unknown in observational studies. Estimate e(x), e.g., via logistic regression. Advantages: 1 Diagnostics: check balance of X between treatment and control groups after matching or subclassifying on e(x). p{x = 0, e(x)} = p{x = 1, e(x)} 2 Robust to misspecification of functional forms for estimating propensity score (Drake, 1993; Dehejia and Wahba, 1999). Disadvantage: vulnerable to unobserved confounders.

Covariance Adjustment Non Ignorability of reatment Assignment Propensity Scores In addition to matching and subclassification, Rosenbaum and Rubin (1983) suggested covariance adjustment: Regress Y on e(x) separately for treatment and control. Contol: Y α C + β C e(x) reatment: Y α + β e(x) Average difference of fitted potential outcomes for each unit 1 n n [α + β e(x i ) { α C + β C e(x i ) }] i=1 = α α C + (β β C ) 1 n n e(x i ) Covariance Adjustment seems less robust than matching of subclassifiation. i=1

Outline he Propensity Function he Generalized Propensity Score 1 Causal Inference in Observational Studies Non Ignorability of reatment Assignment Propensity Scores 2 he Propensity Function he Generalized Propensity Score 3

he Propensity Function he Generalized Propensity Score Goal: Generalize propensity score to non-binary treatment. Propensity score is confined to binary treatment scenarios. Researchers have no control over treatment in observational studies. Continuous treatment: dose response function. Ordinal treatment: effects of years in school on income. Event-count, duration, semi-continuous, etc.... Multivariate treatments wo highly-cited methods 1 Generalized Propensity Score (Hirano and Imbens, 2004) 2 he Propensity Function (Imai and van Dyk, 2004)

he Propensity Function he Propensity Function he Generalized Propensity Score Definition: Conditional density of the actual treatment given observed covariates, e( X) p ψ ( X). Uniquely Parameterized Propensity Function Assumption: e( X) depends on X only through θ ψ (X). θ = θ ψ (X) uniquely represents e{ θ ψ (X)}. Examples: 1 Continuous treatment: X N(X β, σ 2 ). ψ = (β, σ 2 ) and θ ψ (X) = X β. 2 Categorical treatment: Multinomial probit for Pr( X). ψ = (β, Σ) and θ ψ (X) = X β. 3 Ordinal treatment: Ordinal logistic model for Pr( X). ψ = β and θ ψ (X) = X β.

he Propensity Function he Generalized Propensity Score he Propensity Function in Practice Properties: 1 Balance: p{ e( X), X} = p{ e( X)}. 2 Ignorablity: p{y (t), e( X)} = p{y (t) e( X)} t Estimation of Causal Effects: Subclassification: p{y (t)} = p{y (t) = t, θ}p(θ)dθ J p{y (t) = t, ˆθ j } W j. Subclassify observations into J subclasses based on ˆθ. Regress Y ( ) on (and ˆθ) within each subclass. Average the within subclass average treatment effects. j=1

he Propensity Function he Generalized Propensity Score he Propensity Function in Practice Estimation of Causal Effects: Subclassification: p{y (t)} = p{y (t) = t, θ}p(θ)dθ Smooth Coefficient Model: J p{y (t) = t, ˆθ j } W j. j=1 E(Y (t) = t, ˆθ) = f (ˆθ) + g(ˆθ). f ( ) and g( ) are unknown smooth continuous functions. Average treatment effect: 1 n n i=1 ĝ(ˆθ i )

he Generalized Propensity Score he Propensity Function he Generalized Propensity Score Definition: Conditional density of treatment given observed covariates evaluated at observed treatment, R = r( X). Properties: 1 Balance: I{ = t} is conditionally indep. of X given r(t X). 2 Ignorability: p {t r(t, X), Y (t)} = p {t r(t, X)} for every t. Estimation of Dose Response Function: E(Y (t) = t, ˆR) = α 0 +α 1 +α 2 2 +α 3 ˆR+α 4 ˆR 2 +α 5 ˆR. Ê{Y (t)} = 1 n n i=1 ( ) ˆα 0 + ˆα 1 t + ˆα 2 t 2 + ˆα 3 ˆr(t, X i )+ ˆα 4 ˆr(t, X i ) 2 + ˆα 5 t ˆr(t, X i ). Note similarity with covariance adjustment.

he Propensity Function he Generalized Propensity Score Comparing GPS with Covariance Adjustment Covariance Adjustment with Binary reatment: Regress Y on e(x) separately for treatment and control. Contol: Y α C + β C e(x) reatment: Y α + β e(x) Average difference of fitted potential outcomes 1 n [α + β e(x i ) { α n C + β C e(x i ) }] i=1 Estimation of Dose Response Function with GPS: E(Y (t) = t, ˆR) = α 0 +α 1 +α 2 2 +α 3 ˆR+α 4 ˆR 2 +α 5 ˆR. Ê{Y (t)} = 1 n n i=1 ( ) ˆα 0 + ˆα 1 t + ˆα 2 t 2 + ˆα 3 ˆr(t, X i )+ ˆα 4 ˆr(t, X i ) 2 + ˆα 5 t ˆr(t, X i ).

Dose Response Functions he Propensity Function he Generalized Propensity Score Imai & van Dyk (2003) compute average treatment effect. Subclassification: Weighted average of within subclass coefficients of. SCM: Average of individual coefficients of, ĝ(ˆθ i ). Hirano & Imbens (2003) compute full Dose Response. he subclassification formula of Imai and van Dyk, however, gives a parametric Dose Response Function: J p{y (t)} = p{y (t) = t, θ}p(θ)dθ p{y (t) = t, θ j } W j. j=1

Simulation 1 he Propensity Function he Generalized Propensity Score A very simple simulation: Fits X indep. N (0.5, 0.25), X indep. N (X, 0.25), and n = 2000 Y (t) t, X indep. N (10X, 1) for all t Linear Regression Y gives a treatment effect of 5. reatment model is correctly specified. IvD: Linear regression (Y ) within S subclasses. Do not adjust for ˆθ in within subclass models. Doing so would dramatically improve preformence.

Simulation 1 Results he Propensity Function he Generalized Propensity Score Estimated Derivative of DRF HI IvD(S=5) IvD(S=10) IvD(S=50) Estimated Relative DRF 0 2 4 6 HI IvD(S=5) IvD(S=10) IvD(S=50) 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0 reatment reatment Expect any method to work well in this simple setting

Simulation 2 he Propensity Function he Generalized Propensity Score Frequency Evaluation: X indep. N (0, 1), X indep. N (X + X 2, 1), and n = 2000. Generative model for Y (t): Linear DRF: Y (t) t, X indep. N (X + t, 9) Quadratic DRF: Y (t) t, X indep. N ((X + t) 2, 9) Correctly specified treatment models used for all fits. Linear and quadratic (in ) response models used with both methods. Used 10 subclasses with IvD. Entire procedure was replicated for 1000 simulated data sets.

Simulation 2 Results he Propensity Function he Generalized Propensity Score Generative Dose Response Function: Linear Fitted Dose Response Function: Linear Fitted Dose Response Function: Quadratic IvD HI IvD HI Estimated Relative DRF Generative Dose Response Function: Quadratic Fitted Dose Response Function: Linear Fitted Dose Response Function: Quadratic IvD HI IvD HI Estimated Relative DRF 10 10 30 50 10 10 30 50 10 10 30 50 10 10 30 50

Outline 1 Causal Inference in Observational Studies Non Ignorability of reatment Assignment Propensity Scores 2 he Propensity Function he Generalized Propensity Score 3

Covariance Adjustment for a Categorical reatment Estimating Dose Response using Covariance Adjustment: Regress Y on ˆR = ˆr( = t, X) separately for units in each treatment group: Group t: Y α t + β t ˆR ( ) Average fitted potential outcomes of all units E(Y (t)) = 1 n n i=1 [ ] ˆα t + ˆβ t ˆr(t, X i ) Pairwise differences are exactly Rosenbaum and Rubin s estimate of the average treatment effect. Hirano and Imbens replace ( ) with quadratic regression. Y (t) α 0 + α 1 + α 2 2 + α 3 ˆR + α 4 ˆR 2 + α 5 ˆR.

Covariance Adjustment for a Continuous reatment Hirano and Imbens quadratic regression extends easily to continuous treatments, but is less robust than treatment-level specific regressions. Solution: Discretize continuous treatment variable. (Zhang, van Dyk, & Imai, 2013) Use quantiles of the treatment for discretization. Extreme categories tend to have wider range of and thus we expect more bias. Within subclass t, we fit e.g., Y α t + β t ˆR Estimate Dose Response: E(Y (t)) = 1 n n i=1 [ˆαt + ˆβ t ˆr(t, X i ) ] Note: In contrast to standard methods, we subclassify on the treatment rather than on the propensity score.

Using a SCM with the GPS Fitting Y α t + β t ˆR within each of several subclasses can be viewed as a flexible regression of Y on R and. Flores et al. (2012) proposed another solution: Non-parametrically model Y f (R, ) Estimate Dose Response: E(Y (t)) = 1 n { n i=1 [ˆf r(t, Xi ), t }] Flores et al. use a non-parametric kernel estimator. We use a SCM to facilitate comparisons. he three GPS-based methods vary only in the choice of response model (quadratic regression, subclassification, or SCM)

Importance of the Choice of Response Model Fitted E(Y (t) = t, R) for dataset in Simulation 1 Covariance Adjustment GPS Quadratic Fit within Subclasses SCM(GPS) HI Mean Potential Outcome 10 5 0 5 10 15 =0.4 =0.2 =0 =0.8 =0.6 0.0 0.2 0.4 0.6 0.8 GPS Mean Potential Outcome 10 5 0 5 10 15 =0.8 =0.6 =0.4 =0.2 =0 0.0 0.2 0.4 0.6 0.8 GPS 10 5 0 5 10 15 =0.8 =0.6 =0.4 =0.2 =0 0.0 0.2 0.4 0.6 0.8 GPS Covariance Adjustment GPS is much more flexible than Quadratic Regression

Using the Propensity Function to Estimate the DRF Extend Imai and van Dyk (2003) to estimate Dose Response Begin by writing the Dose Response Function as E[Y (t)] = E[Y (t) θ] p(θ)dθ = E[Y ( ) θ, = t] p(θ)dθ Use Smooth Coefficient Model for rightmost integrand E[Y ( ) θ, = t] = f (θ, ), Average over units to obtain fitted Dose Response Ê[Y (t)] = 1 n n ˆf (ˆθi, t), i=1

he Problem with Extrapolation Estimated Dose Response: Ê[Y (t)] = 1 n n ˆf (ˆθ i, t). i=1 Evaluating at t 0 involves evaluating ˆf (ˆθ i, t 0 ) at every observed value of ˆθ i. Invariably, the range of ˆθ among units with near t 0 is smaller than the total range of ˆθ, at least for some t 0. his estimate involves some degree of extrapolation!! We can diagnose using a scatterplot of vs ˆθ.

he Problem with Extrapolation Similarly with the GPS: Ê[Y (t)] = 1 n n { ˆf r(t, Xi ), t }. his problem is more acute: he range of R among units with near t 0 may not even overlap with range of r(t 0, X). i=1 his is true for all three response models. he estimate of the DRF at t 0 may depend entirely on extrapolation!! We can diagnose by comparing scatterplots of vs R and vs r(, X)

Simulation 1: Revisit Original HI/IvD Method Covariance Adjustment GPS SCM(GPS) Y~SCM(θ,) Estimated DRF 0 2 4 6 8 10 HI IvD(S=5) IvD(S=10) IvD(S=50) Estimated DRF 0 2 4 6 8 10 Estimated DRF 0 2 4 6 8 10 Estimated DRF 0 2 4 6 8 10 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0 reatment reatment reatment reatment SCM(GPS) exhibits an odd cyclic pattern SCM(P-Function) results in a much smoother fit. As expected, covariance adjustment GPS is biased in extreme subclasses.

Simulation 2: Revisited Fitted Dose Response Function: Linear Linear Generative DRF Fitted Dose Response Function: Quadratic IvD HI IvD HI Estimated Relative DRF Covariance Adjustment GPS SCM(GPS) SCM(P FUNCION) Estimated Relative DRF Estimated Relative DRF Estimated Relative DRF Reduced bias without parametric specification

Simulation 2: Revisited Quadratic Generative DRF Fitted Dose Response Function: Linear Fitted Dose Response Function: Quadratic IvD HI IvD HI Estimated Relative DRF 10 10 30 50 10 10 30 50 10 10 30 50 10 10 30 50 Covariance Adjustment GPS SCM(GPS) SCM(P FUNCION) Estimated Relative DRF 10 0 10 20 30 40 50 Estimated Relative DRF 10 0 10 20 30 40 50 Estimated Relative DRF 10 0 10 20 30 40 50 Reduced bias without parametric specification

Example: Effects of Smoking on Medical Expenditure Data: 9,708 smokers from 1987 National Medical Expenditure Survey (Johnson et al., 2002). reatment variable, A = log(packyear): continuous measure of cumulative exposure to smoking: packyear = # of cigarettes per day 20 # of years smoked. alternative strategy: frequency and duration of smoking as bivariate treatment variable. Outcome: self-reported annual medical expenditure. Covariates: age at time of survey, age when the individual started smoking, gender, race, marriage status, education level, census region, poverty status, and seat belt usage.

Model Specification of the Propensity Function Model: A X indep. N(X β, σ 2 ) and θ = X β where X includes some square terms in addition to linear terms. without controlling for theta controlling for theta quantiles of t-statistics -20 0 20 40 60 quantiles of t-statistics -20 0 20 40 60-2 -1 0 1 2 standard normal quantiles -2-1 0 1 2 standard normal quantiles First panel: -stats for predicting covariates from log(packyear). Second panel: Same, but controlling for propensity funt n. he Balance is Improved Balance serves as a model diagnostic for propensity function.

Using the Method of Imai and van Dyk (2004) Within each block, use a wo-part model for the semi-continuous response: 1 Logistic regression for Pr(Y > 0 A, ˆθ j ). 2 Gaussian linear regression for p{log(y ) Y > 0, A, ˆθ j ). Propensity Function Direct Models 3 blocks 10 blocks Logistic Linear Regression Model coefficient for A -0.097-0.060-0.065 standard error 3.074 3.031 3.074 Gaussian Linear Regression Model average causal effect 0.026 0.051 0.053 standard error 0.016 0.017 0.018

A Smooth Coefficient Model In the subclassification-based analysis, we fit a separate regression in each subclass, allowing for different treatment effects in each subclass. An alternate strategy allows the treatment effect to vary smoothly with θ. For example, in stage two log(y ) Normal(α(θ) + β(θ) A + γx, σ 2 ), where α and β are smooth functions of θ.

he Smooth Coefficient Model Fit Causal effect 0.1 0.0 0.1 0.2 0.3 Subclass I II III 1 2 3 4 Propensity function he propensity score is the linear predictor for log(packyears). Propensity funct n is linear predictor for log(packyears). Important Propensity covariates function include is age hard and to age interpret. when began smoking. ForDose older people response the effect function of smoking would onbe medical much expenses more is useful. greater.

Estimating the Dose Response Function We begin with a simulation study: Use observed X and and simulate Y using each of 1 Quadratic DRF: E[log(Y i (t)) X] = 4 25 t 2 + [log(age i )] 2 2 Piecewise Linear DRF: { 4 0.5 t + [log(age E[log(Y i (t)) X] = i )] 2, t 2 5 + 2.3 (t 2) + [log(age i )] 2, t > 2 3 Hockey-Stick DRF: { 8.1 + [log(age E[log(Y i (t)) X] = i )] 2, t 3 8.1 + 1.5 (t 3) 2 + [log(age i )] 2, t > 3

Extrapolation in Fitting the GPS-Based DRF Recall: Y f (R, ) and E(Y (t)) = 1 n n i=1 [ˆf { r(t, Xi ), t }] R 0.0 0.2 0.4 0.6 r(t,x) 0.0 0.2 0.4 0.6 2 0 2 4 2 0 2 4 Extrapoloation for: Small t (r > 0.3), Midrange t (r < 0.2) and Large t (r < 0.1) t

Extrapolation in Fitting the P-Function based DRF Recall: Y f (ˆθ, ) Ê[Y (t)] = 1 n n ˆf (ˆθ i, t). i=1 We expect bias in Ê[Y (t)] for t > 3. θ^i = estimated P FUNCION 1 2 3 4 0 1 2 3 4 5 i = log(packyear)

Results: Quadratic DRF HI Covariance Adjustment GPS SCM(GPS) SCM(P Function) Estimated DRF 12 14 16 18 20 Estimated DRF 12 14 16 18 20 Estimated DRF 12 14 16 18 20 Estimated DRF 12 14 16 18 20 0 1 2 3 4 5 t = log(packyear) 0 1 2 3 4 5 t = log(packyear) 0 1 2 3 4 5 t = log(packyear) 0 1 2 3 4 5 t = log(packyear) 1 Hirano and Imben s estimate tends to follow unadjusted fit. 2 he two other GPS-based methods do better. 3 SCM(GPS) again exhibits a small cyclic pattern 4 SCM(P-Functions) does the best, at least for t < 3.

Results: Piecewise Linear DRF HI Covariance Adjustment GPS SCM(GPS) SCM(P Function) Estimated DRF 4 6 8 10 12 Estimated DRF 4 6 8 10 12 Estimated DRF 4 6 8 10 12 Estimated DRF 4 6 8 10 12 0 1 2 3 4 5 t = log(packyear) 0 1 2 3 4 5 t = log(packyear) 0 1 2 3 4 5 t = log(packyear) 0 1 2 3 4 5 t = log(packyear) 1 Hirano and Imben s estimate tends to follow unadjusted fit. 2 he two other GPS-based methods do better. 3 SCM(GPS) again exhibits a small cyclic pattern 4 SCM(P-Functions) does the best, at least for t < 3.

Resuls: Hockey-Stick DRF HI Covariance Adjustment GPS SCM(GPS) SCM(P Function) Estimated DRF 4 6 8 10 12 14 Estimated DRF 4 6 8 10 12 14 Estimated DRF 4 6 8 10 12 14 Estimated DRF 4 6 8 10 12 14 0 1 2 3 4 5 t = log(packyear) 0 1 2 3 4 5 t = log(packyear) 0 1 2 3 4 5 t = log(packyear) 0 1 2 3 4 5 t = log(packyear) 1 Hirano and Imben s estimate tends to follow unadjusted fit. 2 he two other GPS-based methods do better. 3 SCM(GPS) again exhibits a small cyclic pattern 4 SCM(P-Functions) does the best, at least for t < 3.

he Estimated DRF: Using the Data Estimated DRF 6.0 6.5 7.0 7.5 Covariance Adjustment GPS Estimated DRF 6.0 6.5 7.0 7.5 Y~SCM(θ,) 0 1 2 3 4 5 t=log(packyear) 0 1 2 3 4 5 t=log(packyear)

Summary and Conclusions Fitted Dose Response Function of Hirano and Imbens seems unreliable. Imai & van Dyk is more limited in scope but more reliable. We aim to derive reliable estimates of dose response in observational studies. P-function based estimate appear to offer significant improvement. Nonetheless we advise caution. Large samples are required. In smaller studies, average treatment effect is preferable.