Why experimenters should not randomize, and what they should do instead

Similar documents
What is to be done? Two attempts using Gaussian process priors

Using data to inform policy

Using data to inform policy

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, spring 2019 Statistical decision theory

STAT 518 Intro Student Presentation

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Machine learning, shrinkage estimation, and economic theory

Identification in Triangular Systems using Control Functions

Covariance function estimation in Gaussian process regression

An Introduction to Bayesian Linear Regression

The risk of machine learning

δ -method and M-estimation

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Nonparametric Bayesian Methods (Gaussian Processes)

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Lecture : Probabilistic Machine Learning

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Generated Covariates in Nonparametric Estimation: A Short Review.

How to build an automatic statistician

Gaussian Process Regression

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Habilitationsvortrag: Machine learning, shrinkage estimation, and economic theory

GAUSSIAN PROCESS REGRESSION

Machine Learning 4771

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Optimal taxation and insurance using machine learning sufficient statistics and beyond

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

ECO Class 6 Nonparametric Econometrics

Bayesian Support Vector Machines for Feature Ranking and Selection

CS 7140: Advanced Machine Learning

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Advanced Introduction to Machine Learning CMU-10715

Classification. Chapter Introduction. 6.2 The Bayes classifier

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

Selection on Observables: Propensity Score Matching.

Statistics: Learning models from data

Lecture 7 Introduction to Statistical Decision Theory

Gaussian processes for inference in stochastic differential equations

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Bayesian Linear Regression [DRAFT - In Progress]

Introduction to Simple Linear Regression

Stratified Randomized Experiments

Recap from previous lecture

Introduction. Chapter 1

Statistical Inference and Methods

Lecture 8: The Metropolis-Hastings Algorithm

Linear Methods for Prediction

ECON Introductory Econometrics. Lecture 17: Experiments

Lecture 2: Basic Concepts of Statistical Decision Theory

Econometrics I, Estimation

Econometrics Summary Algebraic and Statistical Preliminaries

41903: Introduction to Nonparametrics

Potential Outcomes Model (POM)

Gaussian Process Regression Model in Spatial Logistic Regression

Gibbs Sampling in Endogenous Variables Models

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Semi-Nonparametric Inferences for Massive Data

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Probability and Information Theory. Sargur N. Srihari

Gaussian Processes for Machine Learning

Model Selection for Gaussian Processes

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Empirical approaches in public economics

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Part 2: One-parameter models

Lecture notes on statistical decision theory Econ 2110, fall 2013

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Gaussian Processes in Machine Learning

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Robustness to Parametric Assumptions in Missing Data Models

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Introduction to Probability and Statistics (Continued)

Financial Econometrics

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Probabilistic & Unsupervised Learning

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

Limit Kriging. Abstract

Introduction to Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Neutron inverse kinetics via Gaussian Processes

Transcription:

Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42

project STAR Introduction Covariate means within school for the actual (D) and for the optimal (D ) treatment assignment School 16 D = 0 D = 1 D = 0 D = 1 girl 0.42 0.54 0.46 0.41 black 1.00 1.00 1.00 1.00 birth date 1980.18 1980.48 1980.24 1980.27 free lunch 0.98 1.00 0.98 1.00 n 123 37 123 37 School 38 D = 0 D = 1 D = 0 D = 1 girl 0.45 0.60 0.49 0.47 black 0.00 0.00 0.00 0.00 birth date 1980.15 1980.30 1980.19 1980.17 free lunch 0.86 0.33 0.73 0.73 n 49 15 49 15 Maximilian Kasy (Harvard) Experimental design 2 / 42

Introduction Some intuitions compare apples with apples balance covariate distribution not just balance of means! don t add random noise to estimators why add random noise to experimental designs? optimal design for STAR: 19% reduction in mean squared error relative to actual assignment equivalent to 9% sample size, or 773 students Maximilian Kasy (Harvard) Experimental design 3 / 42

Introduction Some context - a very brief history of experiments How to ensure we compare apples with apples? 1 physics - Galileo,... controlled experiment, not much heterogeneity, no self-selection no randomization necessary 2 modern RCTs - Fisher, Neyman,... observationally homogenous units with unobserved heterogeneity randomized controlled trials (setup for most of the experimental design literature) 3 medicine, economics: lots of unobserved and observed heterogeneity topic of this talk Maximilian Kasy (Harvard) Experimental design 4 / 42

The setup Introduction 1 Sampling: random sample of n units baseline survey vector of covariates X i 2 Treatment assignment: binary treatment assigned by D i = d i (X, U) X matrix of covariates; U randomization device 3 Realization of outcomes: Y i = D i Yi 1 + (1 D i )Yi 0 4 Estimation: estimator β of the (conditional) average treatment effect, β = 1 n i E[Y i 1 Yi 0 X i, θ] Maximilian Kasy (Harvard) Experimental design 5 / 42

Introduction Questions How should we assign treatment? In particular, if X has continuous or many discrete components? How should we estimate β? What is the role of prior information? Maximilian Kasy (Harvard) Experimental design 6 / 42

Introduction Framework proposed in this talk 1 Decision theoretic: d and β minimize risk R(d, β X ) (e.g., expected squared error) 2 Nonparametric: no functional form assumptions 3 Bayesian: R(d, β X ) averages expected loss over a prior. prior: distribution over the functions x E[Y d i X i = x, θ] 4 Non-informative: limit of risk functions under priors such that Var(β) Maximilian Kasy (Harvard) Experimental design 7 / 42

Main results Introduction 1 The unique optimal treatment assignment does not involve randomization. 2 Identification using conditional independence is still guaranteed without randomization. 3 Tractable nonparametric priors 4 Explicit expressions for risk as a function of treatment assignment choose d to minimize these 5 MATLAB code to find optimal treatment assignment 6 Magnitude of gains: between 5 and 20% reduction in MSE relative to randomization, for realistic parameter values in simulations For project STAR: 19% gain relative to actual assignment Maximilian Kasy (Harvard) Experimental design 8 / 42

Roadmap Introduction 1 Motivating examples 2 Formal decision problem and the optimality of non-randomized designs 3 Nonparametric Bayesian estimators and risk 4 Choice of prior parameters 5 Discrete optimization, and how to use my MATLAB code 6 Simulation results and application to project STAR 7 Outlook: Optimal policy and statistical decisions Maximilian Kasy (Harvard) Experimental design 9 / 42

Introduction Notation random variables: X i, D i, Y i values of the corresponding variables: x, d, y matrices/vectors for observations i = 1,..., n: X, D, Y vector of values: d shorthand for data generating process: θ frequentist probabilities and expectations: conditional on θ Bayesian probabilities and expectations: unconditional Maximilian Kasy (Harvard) Experimental design 10 / 42

Introduction Example 1 - No covariates n d := 1(D i = d), σd 2 = Var(Y i d θ) β := [ Di Y i 1 D ] i Y i n 1 n n 1 i Two alternative designs: 1 Randomization conditional on n 1 2 Complete randomization: D i i.i.d., P(D i = 1) = p Corresponding estimator variances 1 n 1 fixed σ1 2 + σ2 0 n 1 n n 1 2 n 1 random [ ] σ 2 E 1 n1 + σ2 0 n 1 n n 1 Choosing (unique) minimizing n 1 is optimal. Indifferent which of observationally equivalent units get treatment. Maximilian Kasy (Harvard) Experimental design 11 / 42

Introduction Example 2 - discrete covariate X i {0,..., k}, n x := i 1(X i = x) n d,x := i 1(X i = x, D i = d), σ 2 d,x = Var(Y d i X i = x, θ) β := x n x n Three alternative designs: i [ Di 1(X i = x) Y i 1 D ] i Y i n 1,x n x n 1,x 1 Stratified randomization, conditional on n d,x 2 Randomization conditional on n d = 1(D i = d) 3 Complete randomization Maximilian Kasy (Harvard) Experimental design 12 / 42

Introduction Corresponding estimator variances 1 Stratified; n d,x fixed V ({n d,x }) := x n x n [ σ 2 1,x n 1,x + σ2 0,x n x n 1,x 2 n d,x random but n d = x n d,x fixed [ ] E V ({n d,x }) n 1,x = n 1 3 n d,x and n d random x E [V ({n d,x })] Choosing unique minimizing {n d,x } is optimal. ] Maximilian Kasy (Harvard) Experimental design 13 / 42

Introduction Example 3 - Continuous covariate X i R continuously distributed no two observations have the same X i! Alternative designs: 1 Complete randomization 2 Randomization conditional on n d 3 Discretize and stratify: Choose bins [x j, x j+1 ] X i = j 1(X i [x j, x j+1 ]) stratify based on X i 4 Special case: pairwise randomization 5 Fully stratify But what does that mean??? Maximilian Kasy (Harvard) Experimental design 14 / 42

Introduction Some references Optimal design of experiments: Smith (1918), Kiefer and Wolfowitz (1959), Cox and Reid (2000), Shah and Sinha (1989) Nonparametric estimation of treatment effects: Imbens (2004) Gaussian process priors: Wahba (1990) (Splines), Matheron (1973); Yakowitz and Szidarovszky (1985) ( Kriging in Geostatistics), Williams and Rasmussen (2006) (machine learning) Bayesian statistics, and design: Robert (2007), O Hagan and Kingman (1978), Berry (2006) Simulated annealing: Kirkpatrick et al. (1983) Maximilian Kasy (Harvard) Experimental design 15 / 42

Decision problem A formal decision problem risk function of treatment assignment d(x, U), estimator β, under loss L, data generating process θ: R(d, β X, U, θ) := E[L( β, β) X, U, θ] (1) (d affects the distribution of β) (conditional) Bayesian risk: R B (d, β X, U) := R(d, β X, U, θ)dp(θ) (2) R B (d, β X ) := R B (d, β X, U)dP(U) (3) R B (d, β) := R B (d, β X, U)dP(X )dp(u) (4) conditional minimax risk: objective: min R B or min R mm R mm (d, β X, U) := max R(d, β X, U, θ) (5) θ Maximilian Kasy (Harvard) Experimental design 16 / 42

Decision problem Optimality of deterministic designs Theorem Given β(y, X, D) 1 d (X ) argmin R B (d, β X ) (6) d(x ) {0,1} n minimizes R B (d, β) among all d(x, U) (random or not). 2 Suppose R B (d 1, β X ) R B (d 2, β X ) is continuously distributed d 1 d 2 d (X ) is the unique minimizer of (6). 3 Similar claims hold for R mm (d, β X, U), if the latter is finite. Intuition: similar to why estimators should not randomize R B (d, β X, U) does not depend on U neither do its minimizers d, β Maximilian Kasy (Harvard) Experimental design 17 / 42

Decision problem Conditional independence Theorem Assume i.i.d. sampling stable unit treatment values, and D = d(x, U) for U (Y 0, Y 1, X ) θ. Then conditional independence holds; P(Y i X i, D i = d i, θ) = P(Y d i i X i, θ). This is true in particular for deterministic treatment assignment rules D = d(x ). Intuition: under i.i.d. sampling P(Y d i i X, θ) = P(Y d i i X i, θ). Maximilian Kasy (Harvard) Experimental design 18 / 42

Nonparametric Bayes Nonparametric Bayes Let f (X i, D i ) = E[Y i X i, D i, θ]. Assumption (Prior moments) E[f (x, d)] = µ(x, d) Cov(f (x 1, d 1 ), f (x 2, d 2 )) = C((x 1, d 1 ), (x 2, d 2 )) Assumption (Mean squared error objective) Loss L( β, β) = ( β β) 2, Bayes risk R B (d, β X ) = E[( β β) 2 X ] Assumption (Linear estimators) β = w 0 + i w iy i, where w i might depend on X and on D, but not on Y. Maximilian Kasy (Harvard) Experimental design 19 / 42

Nonparametric Bayes Best linear predictor, posterior variance Notation for (prior) moments µ i = E[Y i X, D], µ β = E[β X, D] Σ = Var(Y X, D, θ), C i,j = C((X i, D i ), (X j, D j )), and C i = Cov(Y i, β X, D) Theorem Under these assumptions, the optimal estimator equals β = µ β + C (C + Σ) 1 (Y µ), and the corresponding expected loss (risk) equals R B (d, β X ) = Var(β X ) C (C + Σ) 1 C. Maximilian Kasy (Harvard) Experimental design 20 / 42

Nonparametric Bayes More explicit formulas Assumption (Homoskedasticity) Var(Y d i X i, θ) = σ 2 Assumption (Restricting prior moments) 1 E[f ] = 0. 2 The functions f (., 0) and f (., 1) are uncorrelated. 3 The prior moments of f (., 0) and f (., 1) are the same. Maximilian Kasy (Harvard) Experimental design 21 / 42

Nonparametric Bayes Submatrix notation Y d = (Y i : D i = d) V d = Var(Y d X, D) = (C i,j : D i = d, D j = d) + diag(σ 2 : D i = d) C d = Cov(Y d, β X, D) = (C d i : D i = d) Theorem (Explicit estimator and risk function) Under these additional assumptions, β = C 1 (V 1 ) 1 Y 1 + C 0 (V 0 ) 1 Y 0 and R B (d, β X ) = Var(β X ) C 1 (V 1 ) 1 C 1 C 0 (V 0 ) 1 C 0. Maximilian Kasy (Harvard) Experimental design 22 / 42

Nonparametric Bayes Insisting on the comparison-of-means estimator Assumption (Simple estimator) where n d = i 1(D i = d). β = 1 D i Y i 1 (1 D i )Y i, n 1 n 0 i Theorem (Risk function for designs using the simple estimator) Under this additional assumption, [ R B (d, β X 1 ) = σ 2 + 1 ] + n 1 n 0 [ i 1 + ( ) where C ij = C(X i, X j ) and v i = 1 n n Di 0 n 1. ( n1 n 0 ) 2 ] v C v, Maximilian Kasy (Harvard) Experimental design 23 / 42

Prior choice Possible priors 1 - linear model For X i possibly including powers, interactions, etc., Y d i = X i β d + ɛ d i E[β d X ] = 0, Var(β d X ) = Σ β This implies C = X Σ β X ( ) 1 β d = X d X d + σ 2 Σ 1 β X d Y d β ( β1 = X β 0) ( ( R(d, β X ) 1 ( ) ) 1 ) = σ 2 X X 1 X 1 + σ 2 Σ 1 β + X 0 X 0 + σ 2 Σ 1 β X Maximilian Kasy (Harvard) Experimental design 24 / 42

Prior choice Possible priors 2 - squared exponential kernel ( C(x 1, x 2 ) = exp 1 ) 2l 2 x 1 x 2 2 (7) popular in machine learning, (cf. Williams and Rasmussen, 2006) nonparametric: does not restrict functional form; can accommodate any shape of f d smooth: f d is infinitely differentiable (in mean square) length scale l / norm x 1 x 2 determine smoothness Maximilian Kasy (Harvard) Experimental design 25 / 42

Prior choice 2.5 2 1.5 1 0.5 2 1.5.25 f(x) 0 0.5 1 1.5 2 2.5 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 X Notes: This figure shows draws from Gaussian processes with covariance kernel C(x 1, x 2 ) = exp ( 1 x 2l 2 1 x 2 2), with the length scale l ranging from 0.25 to 2. Maximilian Kasy (Harvard) Experimental design 26 / 42

Prior choice 2 1 f(x) 0 1 2 1 0.5 0 X1 0.5 1 1 0.5 X 2 0 0.5 1 Notes: This figure shows a draw from a Gaussian process with covariance kernel C(x 1, x 2 ) = exp ( 1 x 2l 2 1 x 2 2), where l = 0.5 and X R 2. Maximilian Kasy (Harvard) Experimental design 27 / 42

Prior choice Possible priors 3 - noninformativeness non-subjectivity of experiments would like prior non-informative about object of interest (ATE), while maintaining prior assumptions on smoothness possible formalization: Y d i = g d (X i ) + X 1,i β d + ɛ d i Cov(g d (x 1 ), g d (x 2 )) = K(x 1, x 2 ) Var(β d X ) = λσ β, and thus C d = K d + λx d 1 Σ βx d 1. paper provides explicit form of lim min R B (d, β X ). λ β Maximilian Kasy (Harvard) Experimental design 28 / 42

Frequentist Inference Frequentist Inference Variance of β: V := Var( β X, D, θ) β = w 0 + i w iy i V = w 2 i σ 2 i, (8) where σ 2 i = Var(Y i X i, D i ). Estimator of the variance: where ɛ i = Y i f i, f = C (C + Σ) 1 Y. V := w 2 i ɛ 2 i. (9) Proposition V /V p 1 under regularity conditions stated in the paper. Maximilian Kasy (Harvard) Experimental design 29 / 42

Optimization Discrete optimization optimal design solves max C (C + Σ) 1 C d discrete support 2 n possible values for d brute force enumeration infeasible Possible algorithms (active literature!): 1 Search over random d 2 Simulated annealing (c.f. Kirkpatrick et al., 1983) 3 Greedy algorithm: search for local improvements by changing one (or k) components of d at a time My code: combination of these Maximilian Kasy (Harvard) Experimental design 30 / 42

Optimization How to use my MATLAB code g l o b a l X n dimx V s t a r %%%i n p u t X [ n, dimx]= s i z e (X ) ; vbeta = @VarBetaNI w e i g h t s = @weightsni s e t p a r a m e t e r s Dstar=argminVar ( vbeta ) ; w=w e i g h t s ( Dstar ) csvwrite ( o p t i m a l d e s i g n. c s v, [ Dstar ( : ), w ( : ), X ] ) 1 make sure to appropriately normalize X 2 alternative objective and weight function handles: @VarBetaCK, @VarBetaLinear, @weightsck, @weightslinear 3 modifying prior parameters: setparameters.m 4 modifying parameters of optimization algorithm: argminvar.m 5 details: readme.txt Maximilian Kasy (Harvard) Experimental design 31 / 42

Simulations and application Simulation results Next slide: average risk (expected mean squared error) R B (d, β X ) average of randomized designs relative to optimal designs various sample sizes, residual variances, dimensions of the covariate vector, priors covariates: multivariate standard normal We find: The gains of optimal designs 1 decrease in sample size 2 increase in the dimension of covariates 3 decrease in σ 2 Maximilian Kasy (Harvard) Experimental design 32 / 42

Simulations and application Table : The mean squared error of randomized designs relative to optimal designs data parameters prior n σ 2 dim(x ) linear model squared exponential non-informative 50 4.0 1 1.05 1.03 1.05 50 4.0 5 1.19 1.02 1.07 50 1.0 1 1.05 1.07 1.09 50 1.0 5 1.18 1.13 1.20 200 4.0 1 1.01 1.01 1.02 200 4.0 5 1.03 1.04 1.07 200 1.0 1 1.01 1.02 1.03 200 1.0 5 1.03 1.15 1.20 800 4.0 1 1.00 1.01 1.01 800 4.0 5 1.01 1.05 1.06 800 1.0 1 1.00 1.01 1.01 800 1.0 5 1.01 1.13 1.16 Maximilian Kasy (Harvard) Experimental design 33 / 42

Simulations and application Project STAR Krueger (1999a), Graham (2008) 80 schools in Tennessee 1985-1986: Kindergarten students randomly assigned to small (13-17 students) / regular (22-25 students) classes within schools Sample: students observed in grades 1-3 Treatment D = 1 for students assigned to a small class (upon first entering a project STAR school) Controls: sex, race, year and quarter of birth, poor (free lunch), school ID Prior: squared exponential, noninformative Respecting budget constraint (same number of small classes) How much could MSE be improved relative to actual design? Answer: 19 % (equivalent to 9% sample size, or 773 students) Maximilian Kasy (Harvard) Experimental design 34 / 42

Simulations and application Table : Covariate means within school School 16 D = 0 D = 1 D = 0 D = 1 girl 0.42 0.54 0.46 0.41 black 1.00 1.00 1.00 1.00 birth date 1980.18 1980.48 1980.24 1980.27 free lunch 0.98 1.00 0.98 1.00 n 123 37 123 37 School 38 D = 0 D = 1 D = 0 D = 1 girl 0.45 0.60 0.49 0.47 black 0.00 0.00 0.00 0.00 birth date 1980.15 1980.30 1980.19 1980.17 free lunch 0.86 0.33 0.73 0.73 n 49 15 49 15 Maximilian Kasy (Harvard) Experimental design 35 / 42

Simulations and application Summary What is the optimal treatment assignment given baseline covariates? framework: decision theoretic, nonparametric, Bayesian, non-informative Generically there is a unique optimal design which does not involve randomization. tractable formulas for Bayesian risk (e.g., R B (d, β X ) = Var(β X ) C (C + Σ) 1 C) suggestions how to pick a prior MATLAB code to find the optimal treatment assignment Easy frequentist inference Maximilian Kasy (Harvard) Experimental design 36 / 42

Outlook Outlook: Using data to inform policy Motivation 1 (theoretical) 1 Statistical decision theory evaluates estimators, tests, experimental designs based on expected loss 2 Optimal policy theory evaluates policy choices based on social welfare 3 this paper policy choice as a statistical decision statistical loss social welfare. objectives: 1 anchoring econometrics in economic policy problems. 2 anchoring policy choices in a principled use of data. Maximilian Kasy (Harvard) Experimental design 37 / 42

Outlook Motivation 2 (applied) empirical research to inform policy choices: 1 development economics: (cf. Dhaliwal et al., 2011) (cost) effectiveness of alternative policies / treatments 2 public finance:(cf. Saez, 2001; Chetty, 2009) elasticity of the tax base optimal taxes, unemployment benefits, etc. 3 economics of education: (cf. Krueger, 1999b; Fryer, 2011) impact of inputs on educational outcomes objectives: 1 general econometric framework for such research 2 principled way to choose policy parameters based on data (and based on normative choices) 3 guidelines for experimental design Maximilian Kasy (Harvard) Experimental design 38 / 42

Setup The setup 1 policy maker: expected utility maximizer 2 u(t): utility for policy choice t T R dt u is unknown 3 u = L m + u 0, L and u 0 are known L: linear operator C 1 (X ) C 1 (T ) 4 m(x) = E[g(x, ɛ)] (average structural function) X, Y = g(x, ɛ) observables, ɛ unobserved expectation over distribution of ɛ in target population 5 experimental setting: X ɛ m(x) = E[Y X = x] 6 Gaussian process prior: m GP(µ, C) Maximilian Kasy (Harvard) Experimental design 39 / 42

Setup Questions 1 How to choose t optimally given observations X i, Y i, i = 1,..., n? 2 How to choose design points X i given sample size n? How to choose sample size n? mathematical characterization: 1 How does the optimal choice of t (in a Bayesian sense) behave asymptotically (in a frequentist sense)? 2 How can we characterize the optimal design? Maximilian Kasy (Harvard) Experimental design 40 / 42

Main mathematical results Setup 1 Explicit expression for optimal policy t 2 Frequentist asymptotics of t 1 asymptotically normal, slower than n 2 distribution driven by distribution of û (t ) 3 confidence sets 3 Optimal experimental design design density f (x) increasing in (but less than proportionately) 1 density of t 2 the expected inverse of the curvature u (t) algorithm to choose design based on prior, objective function Maximilian Kasy (Harvard) Experimental design 41 / 42

Setup Thanks for your time! Maximilian Kasy (Harvard) Experimental design 42 / 42