Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.

Similar documents
Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

Observational Studies and Propensity Scores

Overlap Propensity Score Weighting to Balance Covariates

Balancing Covariates via Propensity Score Weighting

Balancing Covariates via Propensity Score Weighting: The Overlap Weights

Propensity Score Weighting with Multilevel Data

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Propensity Score Analysis with Hierarchical Data

Balancing Covariates via Propensity Score Weighting

Selection on Observables: Propensity Score Matching.

THE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B.

Propensity Score Methods for Causal Inference

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

What s New in Econometrics. Lecture 1

Combining Experimental and Non-Experimental Design in Causal Inference

Propensity Score Matching

Rerandomization to Balance Covariates

AGEC 661 Note Fourteen

Quantitative Economics for the Evaluation of the European Policy

Optimal Blocking by Minimizing the Maximum Within-Block Distance

PROPENSITY SCORE MATCHING. Walter Leite

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

Achieving Optimal Covariate Balance Under General Treatment Regimes

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Selective Inference for Effect Modification

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Matching. Stephen Pettigrew. April 15, Stephen Pettigrew Matching April 15, / 67

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Gov 2002: 5. Matching

Gov 2002: 4. Observational Studies and Confounding

Lecture 1 January 18

Modern Statistical Learning Methods for Observational Biomedical Data. Chapter 2: Basic identification and estimation of an average treatment effect

Section 10: Inverse propensity score weighting (IPSW)

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates

Propensity Score Methods, Models and Adjustment

Propensity Score Matching and Analysis TEXAS EVALUATION NETWORK INSTITUTE AUSTIN, TX NOVEMBER 9, 2018

Job Training Partnership Act (JTPA)

CompSci Understanding Data: Theory and Applications

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

MATCHING FOR EE AND DR IMPACTS

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Lecture 12: Effect modification, and confounding in logistic regression

University of California, Berkeley

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

PSC 504: Dynamic Causal Inference

Introduction to Statistical Inference

Combining multiple observational data sources to estimate causal eects

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Biostat 2065 Analysis of Incomplete Data

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

The problem of causality in microeconometrics.

Dynamics in Social Networks and Causality

A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II

Causal Inference with Big Data Sets

Estimating the Marginal Odds Ratio in Observational Studies

New Developments in Nonresponse Adjustment Methods

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Correlation and Regression

Covariate Balancing Propensity Score for General Treatment Regimes

Estimating Causal Effects from Observational Data with the CAUSALTRT Procedure

Statistical Models for Causal Analysis

Principles Underlying Evaluation Estimators

The propensity score with continuous treatments

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

UNIVERSITY OF TORONTO Faculty of Arts and Science

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Causal Inference Lecture Notes: Selection Bias in Observational Studies

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

arxiv: v1 [stat.me] 15 May 2011

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation

Ch 7: Dummy (binary, indicator) variables

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Notes on causal effects

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

High Dimensional Propensity Score Estimation via Covariate Balancing

Stratified Randomized Experiments

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

Flexible Estimation of Treatment Effect Parameters

arxiv: v2 [stat.me] 19 Jan 2017

Counterfactual Model for Learning Systems

Summary and discussion of The central role of the propensity score in observational studies for causal effects

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Implementing Matching Estimators for. Average Treatment Effects in STATA

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Machine Learning Linear Classification. Prof. Matteo Matteucci

Targeted Maximum Likelihood Estimation in Safety Analysis

A comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Transcription:

Weighting Unconfounded Homework 2 Describe imbalance direction matters STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University Dummy and interactions Rubin 2007 common comments: o Full probability model on the science? o Throwing away units => bias? o Regression? Regression Y i (1) = Y i (0) for all i t = Y (1)-Y (0) = 0 Y i obs = a +tw i + bx i + e i Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -4150.0 574.2-7.228 1.13e-10 *** x 1102.4 143.1 7.706 1.12e-11 *** w -1899.6 515.2-3.687 0.000374 *** Regression Regression models are okay when covariate balance is good between the groups (randomized experiment, after subclassification or matching) If covariate balance bad (after unadjusted observational study), regression models rely heavily on extrapolation, and can be very sensitive to departures from linearity Decisions Matching: o with or without replacement? o 1:1, 2:1,? o distance measure? o what to match on? o how to weight? o exact matching for any variable(s)? o calipers for when a match is acceptable? o Let s try it on the Lalonde data Weighting An alternative to either subclassification or matching is to use weighting Each unit gets a weight, and estimates are a weighted average of the observed outcomes within each group N ˆt = ål i W i Y i (1) -å l i (1-W i )Y i (0) i=1 i=1 weights sum to 1 within each group N 1

Observed Difference in Means For the simple estimator observed difference in means: ˆt = N å i=1 W i Y i (1) N T - i=1 (1- W i )Y i (0) treated units weighted by 1/N T and control units weighted by 1/N C N å N C Subclassification Sample size Subclass 1 Subclass 2 Control 8 4 Treatment 2 6 Weights Subclass 1 Subclass 2 Control (1/8)/2 (1/4)/2 Treatment (1/2)/2 (1/6)/2 Propensity Score Weighting Various different choices for the weights, most based on the propensity score, e(x) The following create covariate balance: Weights Treatment Control Horvitz-Thompson (ATE) 1 e(x) 1 1- e(x) ATT e(x) 1 1- e(x) Overlap 1- e(x) e(x) Weighting Methods Horvitz-Thompson (ATE, average treatment effect): Weights sample to look like covariate distribution of entire sample (like subclassification) ATT (Average treatment effect for the treated): Weights sample to look like covariate distribution of treatment group (like matching) Overlap: Weights sample to emphasize the overlap between the treatment and control groups (new: Profs Li and Morgan) Toy Example Control simulated from N(0,1) Treatment simulated from N(2, 1) Weighted Means Control Treatment Unweighted 0.03 1.98 HT (ATE) 1.19 0.74 ATT 2.22 1.98 Overlap 1.01 1.01 Weighting Methods Don Rubin does not like weighting estimators he says you should use subclassification or matching Why? The weights everyone uses (HT, ATT) don t work! Prof Li and I working together to develop the overlap weight to address this problem 2

Weighting Methods Toy Example Y i obs ~ N(X i,1)+tw i t = 1 Problem with propensity scores in the denominator: weights can blow up for propensity scores close to 0 or 1, and extreme units can dominate Racial Disparity Goal: estimate racial disparity in medical expenditures (causal inference?) The Institute of Medicine defines disparity as a difference in treatment provided to members of different racial (or ethnic) groups that is not justified by the underlying health conditions or treatment preferences of patients Balance health status and demographic ; observe difference in health expenditures MEPS Data Medical Expenditure Panel Survey (2009) Adults aged 18 and older o 9830 non-hispanic Whites o 4020 Blacks o 1446 Asians o 5280 Hispanics 3 different comparisons: non-hispanic whites with each minority group Propensity Scores MEPS Weights 31 covariates (5 continuous, 26 binary), mix of health indicators and demographic Logistic regression with all main effects Separate propensity score model for each minority group 3

One High Asian Weight One Asian has a weight of 30%! (out of 1446 Asians) 78 year old Asian lady with a BMI of 55.4 (very high, highest Asian BMI in data) Truncate? In practice people truncate these extreme weights to avoid this problem, but then estimates can become very dependent on truncation choice In the sample, White people are older and fatter, so this obese old Asian lady has a very high propensity score (0.9998) Unnormalized weight: 1/(1-0.9998) = 5000 This is too extreme will mess up weights with propensity score in the denominator MEPS Imbalance MEPS Imbalance MEPS Imbalance MEPS Imbalance Using the overlap weights, largest imbalance was a difference in means 0.0000003 standard errors apart For the Asian and Hispanic comparison, HT and ATT gave some differences in means over 74 standard errors apart!!!! (BMI) Initial imbalance can make propensity scores in the denominator of weights mess up horribly 4

Overlap Weights The overlap weights provide better balance and avoid extreme weights Automatically emphasizes those units who are comparable (could be in either treatment group) Pros: Better balance, better estimates, lower standard errors Con: estimand not clear Disparity Estimates Estimated disparities for total health care expenditure, after weighting: Weighting Weighting provides a convenient and flexible way to do causal inference Easy to incorporate into other models Easy to incorporate survey weights However, conventional weighting methods that place the propensity score in the denominator can go horribly wrong The overlap weighting method is a new alternative developed to avoid this problem Review Not everything you need to know, but some of the main points Causality Causality is tied to an action (treatment) Potential outcomes represent the outcome for each unit under treatment and control A causal effect compares the potential outcome under treatment to the potential outcome under control for each unit In reality, only one potential outcome observed for each unit, so need multiple units to estimate causal effects SUTVA, Framework SUTVA assumes no interference between units and only one version of each treatment Y, W, Y i obs, Y i mis The assignment mechanism (how units are assigned to treatments) is important to consider Using only observed outcomes can be misleading (e.g. Perfect Doctor) Potential outcome framework and Rubin s Causal Model can help to clarify questions of causality (e.g. Lord s Paradox) 5

Assignment Mechanism Covariates (pre-treatment ) are often important in causal inference Assignment probabilities: o Pr(W X, Y(1), Y(0)) o p i (X, Y(1), Y(0)) = Pr(W i X, Y(1), Y(0)) Properties of the assignment mechanism: o individualistic o probabilistic o unconfounded o known and controlled Randomized Experiments We ll cover four types of classical randomized experiments: obernoulli randomized experiment ocompletely randomized experiment ostratified randomized experiment opaired randomized experiment Increasingly restrictive regarding possible assignment vectors Fisher Inference Sharp null of no treatment effect allows you to fill in missing potential outcomes under the null Randomization distribution: distribution of the statistic due to random assignment, assuming the null p-value: Proportion of statistics in the randomization distribution as extreme as observed statistic Neyman s Inference (Finite Sample) 1. Define the estimand: t º Y(1)-Y(0) 2. unbiased estimator of the estimand: ˆt º Y T obs -Y C obs 3. true sampling variance of the estimator var( ˆt ) = S 2 T + S 2 C - S 2 TC N T N C N 4. unbiased estimator of the true sampling variance of the estimator (IMPOSSIBLE!) Overestimate: var( ˆt ) = s 2 T + s 2 C N T N C 5. Assume approximate normality to obtain p- value and confidence interval 34 Slide adapted from Cassandra, Harvard Fisher Goal: testing Fisher vs Neyman Considers only random assignment H 0 : no treatment effect Works for any test statistic Exact distribution Works for any known assignment mechanism Neyman Goal: estimation Considers random assignment and random sampling H 0 : average treatment effect = 0 Difference in means Approximate, relies on large n Only derived for common designs Summary: Using Covariates By design: ostratified randomized experiments opaired randomized experiments orerandomization By analysis: ooutcome: gain scores oseparate analyses within subgroups oregression oimputation 6

1) Collect covariate data Specify criteria determining when a randomization is unacceptable; based on covariate balance 1000 Randomizations Pure Randomization Rerandomization 2) (Re)randomize Randomize subjects to to treated and control RERANDOMIZATION Check covariate balance p.001 a unacceptable acceptable 3) Conduct experiment 4) Analyze results (with a randomization test) -10-5 0 5 10 Difference in Mean Age Select Facts about Classical Randomized Experiments Timing of treatment assignment clear Design and Analysis separate by definition: design automatically prospective, without outcome data Unconfoundedness, probabilisticness by definition Assignment mechanism and so propensity scores known Randomization of treatment assignment leads to expected balance on covariates ( Expected Balance means that the joint distribution of covariates is the same in the active treatment and control groups, on average) Analysis defined by protocol rather than exploration 39 Select Facts about Observational Studies Timing of treatment assignment may not be specified Separation between design and analysis may become obscured, if covariates and outcomes arrive in one data set Unconfoundedness, probabilisticness not guaranteed Assignment mechanism and therefore propensity scores unknown Lack of randomization of treatment assignment leads to imbalances on covariates Analysis often exploratory rather than defined by protocol 40 Timing of treatment assignment may not be specified Separation between design and analysis may become obscured, if covariates and outcomes arrive in one data set Unconfoundedness, probabilisticness not guaranteed Assignment mechanism and therefore propensity scores unknown Lack of randomization of treatment assignment leads to imbalances on covariates Analysis often exploratory rather than defined by protocol 41 1. Determine timing of treatment assignment relative to measured Separation between design and analysis may become obscured, if covariates and outcomes arrive in one data set Unconfoundedness, probabilisticness not guaranteed Assignment mechanism and therefore propensity scores unknown Lack of randomization of treatment assignment leads to imbalances on covariates Analysis often exploratory rather than defined by protocol 42 7

1. Determine timing of treatment assignment relative to measured Unconfoundedness, probabilisticness not guaranteed Assignment mechanism and therefore propensity scores unknown Lack of randomization of treatment assignment leads to imbalances on covariates Analysis often exploratory rather than defined by protocol 43 3. Identify key covariates likely related to outcomes and/or treatment assignment. If key covariates not observed or very noisy, usually better to give up and find a better data source. Assignment mechanism and therefore propensity scores unknown Lack of randomization of treatment assignment leads to imbalances on covariates Analysis often exploratory rather than defined by protocol 44 3. Identify key covariates likely related to outcomes and/or treatment assignment. If key covariates not observed or very noisy, usually better to give up and find a better data source. Assignment mechanism and therefore propensity scores unknown Lack of randomization of treatment assignment leads to imbalances on covariates 3. Identify key covariates likely related to outcomes and/or treatment assignment. If key covariates not observed or very noisy, usually better to give up and find a better data source. 5. Estimate propensity scores, as a way to 6. Find subgroups (subclasses or pairs) in which the active treatment and control groups are balanced on covariates (not always possible; inferences limited to subgroups where balance is achieved) Analysis often exploratory rather than defined by protocol 45 Analysis often exploratory rather than defined by protocol 46 3. Identify key covariates likely related to outcomes and/or treatment assignment. If key covariates not observed or very noisy, usually better to give up and find a better data source. 5. Estimate propensity scores, as a way to 6. Find subgroups (subclasses or pairs) in which the treatment groups are balanced on covariates (not always possible; inferences limited to subgroups where balance is achieved) 3. Identify key covariates likely related to outcomes and/or treatment assignment. If key covariates not observed or very noisy, usually better to give up and find a better data source. 5. Estimate propensity scores, as a way to 6. Find subgroups (subclasses or pairs) in which the active treatment and control groups are balanced on covariates (not always possible; inferences limited to subgroups where balance is achieved) Analysis often exploratory rather than defined by protocol 47 7. Analyze according to pre-specified protocol 48 8

Design Observational Study to Approximate Hypothetical, Parallel Randomized Experiment 3. Identify key covariates likely related to outcomes and/or treatment assignment. If key covariates not observed or very noisy, usually better to give up and find a better data source. 5. Estimate propensity scores, as a way to 6. Find subgroups (subclasses or pairs) in which the active treatment and control groups are balanced on covariates (not always possible; inferences limited to subgroups where balance is achieved) Propensity Score Estimation Fit logistic regression, regressing W on covariates and any interactions or transformations of covariates that may be important Force all primary covariates to be in the model; choose others via variable selection (likelihood ratio, stepwise ) Trim units with no comparable counterpart in opposite group, and iterate between trimming and fitting propensity score model 7. Analyze according to pre-specified protocol 49 Subclassification Divide units into subclasses within which the propensity scores are relatively similar Estimate causal effects within each subclass Average these estimates across subclasses (weighted by subclass size) (analyze as a stratified experiment) Matching Matching: Find control units to match the units in the treatment group Restrict the sample to matched units Analyze the difference for each match (analyze as matched pair experiment) Useful when one group (usually the control) is much larger than the other Decisions Estimating propensity score: o What to include? o How many units to trim, if any? Subclassification: o How many subclasses and where to break? Matching: o with or without replacement? o 1:1, 2:1,? o how to weight / distance measure? o exact matching for any variable(s)? o calipers for when a match is acceptable? o Weighting Weighting provides a convenient and flexible way to do causal inference However, conventional weighting methods that place the propensity score in the denominator can go horribly wrong The overlap weighting method is a new alternative developed to avoid this problem 9

To Do Read Ch 19 Midterm on Monday, 3/17! 10