The Supervised Learning Approach To Estimating Heterogeneous Causal Regime Effects

Similar documents
The Supervised Learning Approach To Estimating Heterogeneous Causal Regime Effects

Deep reinforcement learning. Dialogue Systems Group, Cambridge University Engineering Department

Driving a car with low dimensional input features

Deep Q-Learning with Recurrent Neural Networks

Selection on Observables: Propensity Score Matching.

Causal Inference Basics

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Introduction to Machine Learning and Cross-Validation

Deep Learning Architecture for Univariate Time Series Forecasting

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Big Data, Machine Learning, and Causal Inference

Linear Models for Regression CS534

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Quantitative Economics for the Evaluation of the European Policy

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Reinforcement Learning. Odelia Schwartz 2016

Measuring Social Influence Without Bias

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Linear Regression. Volker Tresp 2018

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Learning Gaussian Process Models from Uncertain Data

Be able to define the following terms and answer basic questions about them:

Reinforcement. Function Approximation. Learning with KATJA HOFMANN. Researcher, MSR Cambridge

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Predicting the Treatment Status

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Introduction to Gaussian Process

Association studies and regression

IFT Lecture 7 Elements of statistical learning theory

Linear Models for Regression CS534

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Linear Models for Regression CS534

Covariate Balancing Propensity Score for General Treatment Regimes

Label-Free Supervision of Neural Networks with Physics and Domain Knowledge

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

causal inference at hulu

Flexible Estimation of Treatment Effect Parameters

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Machine Learning. Lecture 9: Learning Theory. Feng Li.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

CS Machine Learning Qualifying Exam

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

arxiv: v1 [cs.lg] 28 Dec 2016

Statistical Machine Learning

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Dynamics in Social Networks and Causality

Estimation Considerations in Contextual Bandits

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Probabilistic & Bayesian deep learning. Andreas Damianou

A Measure of Robustness to Misspecification

An Introduction to Statistical Machine Learning - Theoretical Aspects -

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

ECO Class 6 Nonparametric Econometrics

Weighted Double Q-learning

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

4 Bias-Variance for Ridge Regression (24 points)

Deep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory

Online Learning and Sequential Decision Making

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Relevance Vector Machines

ECE521 week 3: 23/26 January 2017

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Optmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN)

Statistical Machine Learning Hilary Term 2018

Stochastic Video Prediction with Deep Conditional Generative Models

Linear Models for Regression

Gradient Boosting (Continued)

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

A summary of Deep Learning without Poor Local Minima

Neural Networks and Deep Learning

Multilayer Perceptron

Causal Sensitivity Analysis for Decision Trees

Linear Models for Classification

Fundamentals of Machine Learning

Variational Autoencoder

Finite Population Sampling and Inference

Week 5: Logistic Regression & Neural Networks

Stochastic optimization in Hilbert spaces

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

A Significance Test for the Lasso

Importance Reweighting Using Adversarial-Collaborative Training

Empirical approaches in public economics

Evaluation. Andrea Passerini Machine Learning. Evaluation

Guanglei Hong. University of Chicago. Presentation at the 2012 Atlantic Causal Inference Conference

Introduction to Deep Neural Networks

From perceptrons to word embeddings. Simon Šuster University of Groningen

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Deep Learning Lab Course 2017 (Deep Learning Practical)

arxiv: v1 [cs.ne] 4 Dec 2015

A Sampling of IMPACT Research:

Neural Networks and the Back-propagation Algorithm

Scalable Model Selection for Belief Networks

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Deep Feedforward Networks

Transcription:

The Supervised Learning Approach To Estimating Heterogeneous Causal Regime Effects Thai T. Pham Stanford Graduate School of Business thaipham@stanford.edu May, 2016

Introduction Observations Many sequential treatment settings: patients make adjustments in medications in multiple periods; students decide whether to follow an educational honors program over multiple years; in labor market, the unemployed might participate in a set of programs (job search, subsidized job, training) sequentially. Heterogeneity in treatment sequence reactions: medication effects can be heterogeneous across patients and across time; the same for educational program and labor market. Hard to set up sequential randomized experiments in reality. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 2 / 27

Introduction Contributions Develop a nonparametric framework using supervised learning to estimate heterogeneous treatment regime effects from observational (or experimental) data. Treatment Regime: a set of functions of characteristics and intermediate outcomes. Propose using supervised learning approach (deep learning), which gives good estimation accuracy and which is robust to model misspecification. Propose matching based testing method for the estimation of heterogeneous treatment regime effects. Propose matching based kernel estimator for variance of heterogeneous treatment regime effects (time allows). Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 3 / 27

Contributions (cont d) Introduction In this paper, we Focus on dynamic setting with multiple treatments applied sequentially (in contrast to a single treatment). Focus on the heterogeneous (in contrast to average) effect of a sequence of treatments, i.e. a treatment regime. Focus on observational data (in contrast to experimental data). Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 4 / 27

An Illustrative Model Setup - Motivational Dataset The Setup The North Carolina Honors Program Dataset. There are 24, 112 observations in total. X 0 = [Y 0, d 1, d 2, d 3 ], where Y 0 is the Math test score at the end of 8th grade and d 1, d 2, d 3 are census-data dummy variables. W 0, W 1 {0, 1} are treatment variables. Y 1 : end of 9th grade Math test score. Y 2 : end of 10th grade Math test score (object of interest). Y 0, Y 1, Y 2 are pre-scaled to have zero mean and unit variance. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 5 / 27

Setup - Model An Illustrative Model The Setup End of eighth grade: Students initial information X 0, which includes Math test score Y 0 and other personal information (d 1, d 2, d 3 ), is observed. Decide to follow honors (W 0 = 1) or standard (W 0 = 0) program. End of ninth grade: X 0, W 0, and Math test score Y 1 are observed. Decide to switch or stay in current program (W 1 = 1 or 0). End of tenth grade: X 0, W 0, Y 1, W 1, and Math test score Y 2 are observed. Object of interest: Y 2 (It could be any functions of X 0, Y 1, Y 2 ). Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 6 / 27

An Illustrative Model The Setup Potential Outcome (PO) Framework Treatment regime d = (d 0, d 1 ) has d 0 : X 0 W 0 {0, 1} and d 1 : X 0 W 0 Y 1 W 1 {0, 1}. Potential Outcome Y 1 (W 0 ) = Y 1. Also, the observed outcome Y 1 = W 0 Y 1 (1) + (1 W 0 ) Y 1 (0). Similarly, Y d 2 = Y 2 if the subject follows regime d. We also write Y 2 = Y d 2 = Y 2(W 0, W 1 ) when d 0 maps to W 0 and d 1 maps to W 1. We have Y 2 = W 0 W 1 Y 2 (1, 1) + W 0 (1 W 1 ) Y 2 (1, 0)+ (1 W 0 )W 1 Y 2 (0, 1) + (1 W 0 )(1 W 1 ) Y 2 (0, 0). Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 7 / 27

An Illustrative Model Types of Treatment Regime The Setup Static Treatment Regime: subjects specify (or are specified) the whole treatment plan based only on the initial covariates (X 0 ). So d : X 0 (W 0, W 1 ) {0, 1} 2. Dynamic Treatment Regime: subjects choose (or are assigned) the initial treatment based on the initial covariates (X 0 ); then subsequently choose (or are assigned) the next treatment based on the initial covariates (X 0 ), the first period treatment (W 0 ), and the intermediate outcome (Y 1 ); and so on. This is our original setup. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 8 / 27

An Illustrative Model The Setup Potential Outcome (PO) Framework (Cont d) [ ] Objective: Estimate E Y2 d Y d 2 for individuals (or average), and derive heterogeneous optimal regime Difficulties: d (C) = arg max E d [ Y d 2 C ] for individual covariates C. Fundamental Problem of Causal Inference: for each subject, we never observe both Y d 2 and Y d 2. Selection Bias: students following d may fundamentally be different from those following d (e.g., students with good test scores choose the honors program in each period). Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 9 / 27

An Illustrative Model Identification Results Identification Result - Static Treatment Regime Theorem (Identification Result - STR) Let d 0 = d 0 (X 0 ) and d 1 = d 1 (X 0 ). Then (with Assumptions) [ ] Y2 1{W 0 = d 0 } 1{W 1 = d 1 } [ ] E P(W 0 = d 0 X 0 ) P(W 1 = d 1 X 0 ) X 0 = E Y2 d X 0. Corollary: observed/estimable, Transformed Outcome [ { [ }}{ W0 W 1 E Y 2 (1 W ] ] 0)(1 W 1 ) X 0 = E [ unobserved, PO {}}{ Y 2 (1, 1) Y 2 (0, 0) ] X 0. e 0 e 1 (1 e 0 )(1 e 1 ) Here, e 0 = P(W 0 = 1 X 0 ) and e 1 = P(W 1 = 1 X 0 ). Matching Based Testing Method STR Estimation Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 10 / 27

An Illustrative Model Identification Results Identification Result - Dynamic Treatment Regime Theorem (Identification Result - DTR) Let d 0 = d 0 (X 0 ) and d 1 = d 1 (X 0, X 1, Y 1, W 0 ). Then (with Assumptions) In period T = 1: [ ] 1{W 1 = d 1 } E Y 2 P(W 1 = d 1 X 0, X 1, Y 1, W 0 ) X 0, X 1, Y 1, W 0 }{{} observed/estimable = E [ Y d ] 1 2 X 0, X 1, Y 1, W 0. }{{} In period T = 0: PO [ ] Y 2 1{W 1 = d 1 } 1{W 0 = d 0 } E P(W 1 = d 1 X 0, X 1, Y 1, W 0 ) P(W 0 = d 0 X 0 ) X 0 = E [ Y2 d ] X 0. DTR Estimation Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 11 / 27

Model Estimation Challenges In Traditional Approach Goal: Specify a relation b/w transformed outcome T and covariates C. Econometric approaches assume T = h(c; β) + ɛ for a fixed (linear) function h( ) and E[ɛ C] = 0, and estimate β by minimizing T h(c; β) 2. Problem: Linear models need not give good estimates in general. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 12 / 27

Model Estimation Machine Learning Approach Machine learning methods generally give much more accurate estimates than traditional econometric models. Empirical comparisons of different machine learning methods with linear regressions: Caruana and Niculescu-Mizil (2006) 1 Morton, Marzban, Giannoulis, Patel, Aparasu, and Kakadiaris (2014) 2 1 Caruana, R. and A. Niculescu-Mizil, (2006), An Empirical Comparison of Supervised Learning Algorithms, Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA. 2 Morton, A., E. Marzban, G. Giannoulis, A. Patel, R. Aparasu, and I. A. Kakadiaris, (2014), A Comparison of Supervised Machine Learning Techniques for Predicting Short-Term In-Hospital Length of Stay Among Diabetic Patients, 13th International Conference on Machine Learning and Applications. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 13 / 27

Model Estimation Machine Learning Approach (Cont d) Goal: Specify a relation b/w transformed outcome T and covariates C. Machine learning (ML) methods allow h( ) to vary in terms of complexity, and estimate β by minimizing T h(c; β) 2 + λg(β) where g penalizes complex models. Data set = (Training, Validation, Test). Use Training set (with CV) to choose the optimal h( ) in terms of complexity, validation set to choose the optimal hyperparameter λ, and test set to evaluate the performance. RMSE is the comparison criterion. Hence, ML approach is flexible and performance oriented. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 14 / 27

Model Estimation Estimating Model Propensity Score Estimation: Use logistic regression or other ML techniques such as Random Forest, Gradient Boosting, etc. Full Model Estimation: (Though many ML techniques would work) We use a deep learning method in machine learning literature called Multilayer Perceptron. It possesses the universal approximation property: it can approximate any continuous function on any compact subset of R n. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 15 / 27

Model Estimation Multilayer Perceptron (MLP) Assume we want to estimate T = h(c; β) + ɛ. MLP (with one hidden layer) considers K ( ) h(c; β) = α j σ(γj T C + θ j ) and β = K, (α j, γ j, θ j ) K j=1, j=1 where σ is a sigmoid function such as σ(x) = 1/(1 + exp( x)). Empirically, MLP (and deep learning in general) is shown to work very well. (Lecun et al. 3, Mnih et al. 4 ) 3 LeCun, Y., Y. Bengio, and G. Hinton, (2015), Deep Learning, Nature 521, 436-444 (28 May). 4 Mnih, V., K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, (2015), Human-level control through deep reinforcement learning, Nature 518, 529-533 (26 February). Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 16 / 27

Model Estimation Testing Method Matching Based Testing Method The identification results relate unobserved difference of potential outcomes Z to observed (or estimable) transformed outcome T : E[T C] = E[Z C]. For example in STR : Z = Y d 2 Y d 2, C = X 0. Randomly draw M units with treatment regime d. Denote by x d,m 0 s and y d,m 2 s the covariates and corresponding outcomes. For each m, determine x d,m 0 = arg min x i 0 regime = d x i 0 x d,m 0 2. Let τ m = y d,m 2 y d,m 2. Here, τ is a proxy for the unobserved Z. Let τ be the estimator which fits x 0 to T. Define τ m = 1 d,m 2 ( τ(x 0 ) + τ(x d,m 0 )). Define the matching loss M : 1 M M m=1 ( τ m τ m ) 2. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 17 / 27

Simulation Setup Simulations Setup Test the ability of our method in adapting to heterogeneity in the treatment regime effect. 50, 000 obs for training; 5, 000 obs for validation; 5, 000 for testing. X 0 U([0, 1] 10 ); W 0 {0, 1}; Y 1 R with standard normal noise; W 1 {0, 1}; Y 2 R with standard normal noise. e 0 (X 0 ) = e 1 (X 0 ) = e 1 (X 0, W 0, Y 1 ) = 0.5. [ τ 1 (X 0 ) = E Y W 0=1 1 Y W 0=0 1 τ 2 (X 0, W 0, Y 1 ) = E where ξ(x) = ] X 0 = ξ(x 0 [1])ξ(X 0 [2]); and ] X 0, W 0, Y 1 [ Y W 1=1 2 Y W 1=0 2 = ρ(y 1 )ρ(w 0 )ξ(x 0 [1]) 2 1 + e 12(x 1/2) ; ρ(x) = 1 + 1 1 + e 20(x 1/3). Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 18 / 27

Simulations Results Simulation Results Table: Performance In Terms of Root Mean Squared Error (RMSE) Method Linear Regression (LR) Multilayer Perceptron (MLP) STR 1.75 1.66 DTR: T = 0 0.74 0.13 DTR: T = 1 1.10 0.20 sdv (TO: T = 0) = 2.41; sdv (true effect: T = 0) = 1.34. sdv (TO: T = 1) = 3.21; sdv (true effect: T = 1) = 2.52. Comments: MLP returns really good results, and it outperforms LR. Static setting does not fit here as the RMSEs on STR are bad. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 19 / 27

Simulations Results Simulation Results (cont d) 5 MLP: T = 0 True Effect: T = 0 LR: T = 0 MLP: T = 1 True Effect: T = 1 LR: T = 1 Figure: Heterogeneous Treatment Regime Effect Using Validation and Test Data. The first row corresponds to period T = 0 and the second row corresponds to period T = 1. In each period: the middle picture visualizes the true treatment effect; the left one is the estimated effect by using Multilayer Perceptron; and the right one is the estimated effect by using Linear Regression. 5 We thank Wager and Athey (2015) for sharing their visualization code. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 20 / 27

Empirical Application Propensity Score Estimation Estimation of Propensity Scores Use the North Carolina Honors Program data (in illustrative model). Estimate and P(W 0 = 1 X 0 ), P(W 1 = 1 X 0 ) P(W 1 = 1 X 0, Y 1, W 0 ). Use Random Forest as a probabilistic classification problem. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 21 / 27

Empirical Application Static Treatment Regime Estimation Model Estimation - Static Treatment Regime Use three methods: Linear Regression, Gradient Boosting, and Multilayer Perceptron. Method Validation Matching Loss Test Matching Loss Linear Regression 11.15 9.77 Gradient Boosting 5.03 4.89 Multilayer Perceptron 3.20 3.27 Comments: MLP outperforms other methods. All results are bad, which signals the dynamic nature of the data. STR Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 22 / 27

Empirical Application Dynamic Treatment Regime Estimation Model Estimation - Dynamic Treatment Regime Period T = 1 - Method Validation Matching Loss Test Matching Loss Linear Regression 1.29 1.29 Gradient Boosting 0.94 1.01 Multilayer Perceptron 0.94 1.01 *sdv(to: val) = 4.06; sdv(est. true effect: val) = 0.85. *sdv(to: test) = 4.03; sdv(est. true effect: test) = 0.92. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 23 / 27

Empirical Application Dynamic Treatment Regime Estimation Model Estimation - Dynamic Treatment Regime (Cont d) Period T = 0 - Method Validation Matching Loss Test Matching Loss Linear Regression 3.29 3.45 Gradient Boosting 1.51 1.63 Multilayer Perceptron 1.14 1.60 DTR - Use only students who follow the optimal treatment in T = 1. *sdv(to: val) = 6.94; sdv(est. true effect: val) = 0.84. *sdv(to: test) = 7.45; sdv(est. true effect: test) = 0.98. Remark: The results are worse than that in simulations due to unobserved heterogeneity. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 24 / 27

Empirical Application Heterogeneous Optimal Regime Estimation Heterogeneous Optimal Regime Static Treatment Regime Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 25 / 27

Empirical Application Heterogeneous Optimal Regime Estimation Heterogeneous Optimal Regime (cont d) STR: Estimated gain per student from heterogeneous optimal regime over homogeneous optimal ( regime (0, 0): 1 [Ŷ2 (1, 1) 0)] Ŷ2(0, + (1,1) opt (0,0) not opt [Ŷ2 (1, 0) 0)] Ŷ2(0, + 1 ] [Ŷ2 ) (0, 1) Ŷ2(0, 0) = 0.91. (1,0) opt (0,1) opt DTR: Estimated gain per student in T = 0 from heterogeneous optimal W 0 over homogeneous optimal treatment W 0 = 0: ] W W 0 =1 optimal [Ŷ 0 =1 2 Ŷ W 0=0 2 #obs used in T = 0 s.t. W 0 = 1 opt = 0.74. mean(y 2 ) = 0; min(y 2 ) = 4.06; max(y 2 ) = 3.66. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 26 / 27

Conclusion Conclusion We developed a nonparametric framework using supervised learning to estimate heterogeneous causal regime effects. Our model addresses the dynamic treatment setting, the population heterogeneity, and the difficulty in setting up sequential randomized experiments in reality. We introduced machine learning approach, in particular deep learning, which demonstrates its estimation power and which is robust to model misspecification. We also introduced matching based testing method for the estimation of heterogeneous treatment regime effects. A matching based kernel estimator for variance of these effects is introduced in Appendix. Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 27 / 27

Appendix Variance Estimation - Matching Kernel Approach Matching M matching pairs [ ] (x d,m 0, y d,m 2 ); (x d,m 0, y d,m 2 ), m = 1,..., M. Fix x new 0. To estimate σ 2 (x new Let x mean,m 0 = x d,m 0 +x d,m 0 0 ) = Var(Y2 d Y d 2 ɛ m = τ m τ m. 2. An estimator for σ 2 (x0 new ) is x new 0 ), we define M σ 2 (x0 new m=1 ) = K (H 1 [x mean,m 0 x0 new ]) ɛ 2 m M m=1 K (H 1 [x mean,m. 0 x0 new ]) Thai T. Pham Estimation of Heterogeneous Causal Regime Effects 27 / 27