Factor model shrinkage for linear instrumental variable analysis with many instruments

Size: px

Start display at page:

Download "Factor model shrinkage for linear instrumental variable analysis with many instruments"

Bennett Nash
6 years ago
Views:

1 Factor model shrinkage for linear instrumental variable analysis with many instruments P. Richard Hahn Booth School of Business University of Chicago Joint work with Hedibert Lopes 1

2 My goals for this talk: 1) Describe the endogenous predictor problem and the instrumental variables approach. 2) Explain some challenges posed by this standard approach. 3) Argue that some recent advances in factor modeling can help overcome these challenges. 2

3 Instrumental variable analysis is a popular technique for dealing with endogenous predictors. y i = x i β + i (I) i =1,...,n But if cov(x i, i )=α = 0 (II) y i = x i β + αx i + =(β + α)x i + 3

4 Example: consider looking at data on families where y i = x i = incidence of cancer percentage of smokers But smoking may be associated with other attributes which are themselves linked to cancer, e.g. biological or socioeconomic factors. This inter-relatedness causes an interpretation problem: we want β but our wild type inferences give us β + α. 4

5 ε x x α y x β y x i ε x + x y i x i β + y 5

6 εx x α y x x β y x i ε x + x y i x i β + α x + y x 6

7 x y x α z γ x β y x i z i γ + x y i x i β + α x + y x Example: z = tax rate on cigarettes. We assume that the instrument cannot cause cancer but through its effect on smoking. 7

8 x α y x z γ x β y Operationally the trick is that the instrument renders x = x i z i γ quasi-observable via. x x i z i N(z i γ,σ 2 x) y i z i,x i N(x i β + α(x i z i γ),σ 2 y x ) 8

9 This yields a joint normal distribution for (x i,y i z i ) : αγzi µ i = Σ= γz i σx 2 σx(α 2 + β) σx(α 2 + β) σx(α 2 + β) 2 + σy x 2 Now β is identified. 9

10 Bayesian linear IV has been treated by (among others): Lindley, D. V. and G. M. El Sayyad (1968). The Bayesian estimation of a linear functional relationship. Dreze, J. H. (1976). Bayesian limited information analysis of the simultaneous equations model. Geweke, J. (1996). Bayesian reduced rank regression. Chao, J. C. and P. C. B. Phillips (1998). Posterior distribution in limited information analysis of the simultaneous equations model using Jeffreys prior. Kleibergen, F. and E. Zivot (2003). Bayesian and Classical approaches to Instrumental Variable regression. Lopes, H. and Polson, N. (2012). Bayesian instrumental variables: likelihoods and priors. 10

11 Some general references on causal modeling: Judea Pearl s book Causality Phil Dawid s notes Fundamentals of Statistical Causality Chris Sims s notes on Bayesian IV Rossi, Allenby and McCulloch s book Bayesian Statistics and Marketing, Chapter 7 Haavelmo, The Statistical Implications of Simultaneous Equations (esp. section 4 Equations of predictions versus equations of theory ) 11

12 Our work considers the many instruments case. Slogan: If you want to know β you gotta know γ. If dim(z i )=p is large relative to the sample size, all the usual estimation difficulties remain, even if the causal effect of interest is a scalar. Fortunately, we have a strategy for this scenario: factor models. 12

13 A Gaussian factor model can be formulated as a particular decomposition of a covariance matrix: Σ=BB t +Ψ p-by-k p-by-p diagonal This assumption immediately reduces the numbers of covariance parameters to be estimated. Why does this benefit the regression case? West, M. (2003). Bayesian factor regression models in the large p, small n paradigm. 13

14 If the regression coefficients the same factor form, we write them as: γ γ t = θb t (BB t +Ψ) 1 are assumed to adhere to 1-by-k Now B and Ψ depend upon the matrix z : E(x i z,θ)=θe{b t (BB t +Ψ) 1 z}z i At one extreme, the RHS expectation concentrates and γ inferring reduces entirely to inferring. One can use unlabeled data. 14 θ

15 The true γ may not lie in the column space of (BB t +Ψ) 1 B Assuming otherwise can lead one to incorrectly conclude that nothing has been learned about the causal effect. To protect against this possibility, we merely shrink back towards factor structure with a hierarchical prior: π(γ B,θ,Ψ) = N([BB t +Ψ] 1 Bθ t,vi) Hahn, Carvalho, Mukherjee (2012). Partial factor modeling: predictor dependent shrinkage for linear regression. 15

16 We can build an importance sampler which permits easy exact comparison between various priors on γ. y i z i,x i,γ,α,β,σ 2 y x N(x iβ + α(x i z i γ),σ 2 y x ) y i z i,x i,γ t(df = n + κ, S = I n + x t x) x i (x i,x i z i γ) The Woodbury identity makes this efficient to evaluate. 16

17 Modular posterior sampling of π(γ,α,β,σx,σ 2 2 x, z, y) y x : Draw M samples of γ from π(γ x, z). Resample with weights f(y γ,x, z). Sample α, β, σx,σ 2 2 conditional on γ y x : π(α, β, σ 2 x,σ 2 y x x, z, y,γ) 17

18 p = 60 n = 120 Example α =1/4 β =2 k =3 first stage signal-to-noise ratio: second stage signal-to-noise ratio: 1/3 20/3 18

19 The partial factor regression recovers a good approximation of the instrumental regression coefficients. 19

20 The straight regression model does not. 20

21 Factor posterior Ideal posterior α β β Draws from the diffuse straight-regression posterior are superimposed. 21

22 Posterior 95% credible intervals: Model 2.5% 97.5% Partial factor γ Pure regression known The straight-regression model punts on the sign of the causal effect. 22

23 Lessons: γ Knowing is only important via or, more E(x i z i,γ) generally. z i γ Factor models are not just principal component regression. Factor models shine when the relevant structure is hidden by idiosyncratic noise ( b j b t j relative to ). ψ j big The assumption of factor structure translates into tighter causal inferences. The partial factor model weakens the assumption, but still gives increased precision. 23

24 It works with binary predictors, too. Hahn, Scott and Carvalho (2012). A sparse factor-analytic probit model for congressional voting patterns. z i N(m, Σ), z i,j z i,j 0 if zi,j 0 1 if z i,j > 0. Each of the binary vectors is associated with an orthant in and assigned probability according to the corresponding multivariate Gaussian CDF. 2 p R p 24

25 We consider the same example as before: PF Probit Pure regression True versus estimated E(x i z) 25

26 Model 2.5% 97.5% Partial factor Pure regression known

27 The idea can also be extended to the multinomial probit model. Burgette, L. and Hahn, P.R. (2012). Symmetric Bayesian multinomial probit models. This extension will allow us to tackle the compulsory returns-to-schooling data in a completely new way. 27

28 0) Causal inference from observational data is an important interesting problem. 1) Predictive fidelity is necessary but not always sufficient for sound inferences. 2) Factor models prove useful in this context by leveraging marginal covariance information (from potentially unlabeled data). 3) Borrowing information towards predicting a low dimensional quantity suggest merely shrinking over likelihood-borrowing. 28

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework