Simple Linear Regression

Size: px

Start display at page:

Download "Simple Linear Regression"

Noel Pitts
5 years ago
Views:

1 Simple Linear Regression Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

2 Linear Regression and Causality Regression conditional expectation function of Y given X E(Y X) = f (X) = β X Question: When can we interpret coefficients as causal effects? Causal model Structural equation model Y i (t) = α + βt + ɛ i for t = 0, 1, where E(ɛ i ) = 0 1 No interference between units 2 α = E(Y i (0)) 3 β = Y i (1) Y i (0) for all i Constant additive unit causal effect Heterogenous treatment effect model: Y i (t) = α + β i t + ɛ i = α + βt + ɛ i (t) where E(ɛ i ) = 0 and ɛ i(t) = ɛ i + (β i β)t for t = 0, 1 ATE: β = E(β i ) = E(Y i (1) Y i (0)) E(ɛ i (t)) = 0 for t = 0, 1 and α = E(Y i (0)) Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

3 Identification Assumption Strict exogeneity assumption: E(ɛ i T) = E(ɛ i ) = 0 where T = (T 1, T 2,..., T n ) Under this assumption, least squares estimate ˆβ is unbiased for β Randomization of treatment: {Y i (1), Y i (0)} n T E(Y i (t) T) = E(Y i (t)) E(ɛ i (t) T) = E(ɛ i (t)) = 0 E(Y i (t)) = E(Y i T) = α + βt i Random sampling of units: {Y i (1), Y i (0)} {Y j (1), Y j (0)} for any i j {ɛ i (1), ɛ i (0)} {ɛ j (1), ɛ j (0)} for any i j Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

4 Unbiasedness of Least Squares Estimator Let (ˆα, ˆβ) be the least squares estimators When the treatment is binary, ˆα = 1 n (1 T i )Y i n 0 ˆβ = 1 n T i Y i 1 n 1 n 0 n (1 T i )Y i So, ˆα and ˆβ are unbiased for E(Y (0)) and E(Y i (1) Y i (0)) from the randomization inference perspective The same conclusion holds if T is categorical When T is continuous, the model assumes Y i (t) is linear in T Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

5 Homoskedasticity Homoskedastic error: V(ɛ T) = σ 2 I n 1 Random sampling of units implies ɛ i ɛ j 2 Equal variance of potential outcomes V(ɛ i (t)) = V(ɛ i (t) T) = V(Y i (t) T) = V(Y i (t)) = σ 2 t for t = 0, 1 Under the homoskedasticity assumption, model-based variance is, V( ˆβ T) = σ 2 n (T i T ) 2 standard model-based variance estimator is, V( ˆβ T ) = ˆσ 2 n (T i T ) 2 where ˆσ 2 = 1 n 2 Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18 n ˆɛ 2 i

6 Violation of the Homoskedasticity Assumption Homoskedasticity assumption may be unrealistic: σ1 2 σ2 0 ( ) ( ) ˆσ 2 σ1 2 Bias = E n (T i T ) 2 Bias is zero when }{{} under complete randomization = (n 1 n 0 )(n 1) (σ1 2 n 1 n 0 (n 2) σ2 0 ) 1 homoskedasticity assumption holds: σ 2 1 = σ2 0 2 design is balanced: n 1 = n 0 Bias can be negative or positive + σ2 0 n 1 n 0 }{{} true variance Bias is typically small but does not go away asymptotically Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

7 Heteroskedasticity-Robust Variance Estimator Eicker-Huber-White (EHW) robust variance estimator: V EHW ((ˆα, ˆβ) X) }{{} sandwich = (X X) 1 (X diag(ˆɛ 2 i }{{}}{{ )X) (X X) 1 }}{{} = bread meat where X i = (1, T i ) and X = (X 1,..., X n ) When the treatment is binary, bread ( n ) 1 ( n ) ( n X i X i ˆɛ 2 i X ix i X i X i V EHW ( ˆβ T) = σ2 1 n 1 + σ2 0 n 0 where σ 2 t = 1 n t ( ) Bias = E( V( σ1 ˆβ 2 T)) + σ2 0 n 1 n 0 ) 1 n 1{T i = t}(y i Y t ) 2 = ( ) σ1 2 n1 2 + σ2 0 n0 2 Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

8 Improved Robust Variance Estimators HC2 heteroskedasticity-robust variance estimator: ( ) } V HC2 ((ˆα, ˆβ) X) = (X X) {X 1 ˆɛ 2 i diag X (X X) 1 1 p i where p i = X i (X X) 1 X i is the leverage When the treatment is binary, V HC2 ( ˆβ T) = ˆσ2 1 n 1 + ˆσ2 0 n 0 Behrens-Fisher problem: even under normality, ( ˆβ β)/ VHC2 ( ˆβ T) does not follow the Student s t-distribution Degrees of freedom adjustments by Welch (1951) and Bell and McCaffery (2002) to improve approximation Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

9 Reducing Transphobia (Brookman and Kalla Science) Can perspective-taking imagining the world from another s perspective reduce transphobia? Treatment group (n 1 = 912): canvasser asking people to think about a time when they were judged unfairly and were guided to translate that experience to a transgender individual s experience Control group (n 0 = 913): canvasser asking people to recycle Outcome variable: feeling thermometer towards transgender people (0 100) 1 baseline 2 3 day 3 3 week 4 6 week 5 3 month Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

10 Durable Effects 1 3 day effect n 1 = 291, n 0 = 309, ˆσ 2 1 = , ˆσ2 0 = Neyman: est = 6.76, s.e. = 2.32, 95% CI = [2.23, 11.28] Regression: est = 6.76, s.e. = 2.30, 2.31 (HC), 2.32 (HC2) 2 3 month effect n 1 = 179, n 0 = 206, ˆσ 2 1 = , ˆσ2 0 = Neyman: est = 5.30, s.e. = 2.78, 95% CI = [ 0.15, 10.75] Regression: est = 5.30, s.e. = 2.78, 2.78 (HC), 2.78 (HC2) For simplicity, we ignored attrition, which can be problematic The treatment group has more attrition and a larger variance Relatively balanced sample homoskedasticity assumption matters little Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

11 Cluster Randomized Experiments Setup: Clusters of units: j = 1, 2,..., J Number of treated (control) clusters: J 1 (J 0 ) Units within cluster j: i = 1, 2,..., m j Total sample size: n = J j=1 m j Treatment assignment at cluster level: T j {0, 1} Outcome: Y ij = Y ij (T j ) Assumptions: 1 Random assignment: {Y ij (1), Y ij (0)} T j and Pr(T j = 1) = J 1 /J often easy to assign to each cluster 2 No interference across clusters useful when interference within a cluster is likely Causal quantities of interest: SATE 1 n m J j (Y ij (1) Y ij (0)) j=1 PATE E(Y ij (1) Y ij (0)) Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

12 Randomization Inference Assume equal cluster size for simplicity, i.e., m j = m for all j Unbiased estimator: ( ˆτ cluster = 1 J 1 T j J 1 m j=1 ) m Y ij 1 J 0 ( ) J 1 m (1 T j ) Y ij m Easy to show E(ˆτ cluster O) = SATE and thus E(ˆτ cluster ) = PATE j=1 Population variance (random sampling of clusters): V(ˆτ cluster O) V(ˆτ cluster ) = V(Y j(1)) J 1 + V(Y j(0)) J 0 where for t = 0, 1 Y j (t) = 1 m m Y ij (t) Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

13 Intracluster Correlation Coefficient (ICC) Comparison with the standard variance: V(Y j (t)) J t = = V(ˆτ) = σ2 1 J 1 m + σ2 0 J 0 m ( m ) 2 1 J t m 2 E Y ij (t) m µ t 1 m m J t m 2 E (Y ij (t) µ t ) 2 + (Y ij (t) µ t )(Y i j(t) µ t ) = σ2 t J t m {1 + (m 1)ρ t} typically i =1 i i σ 2 t J t m Suppose m = 101 and ρ t = 0.5 more than 50 times greater Random effect model: η j + ɛ ij where ρ = V(η j )/{V(η j ) + V(ɛ ij )} 0 Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

14 Cluster Robust Variance Estimator Cluster robust variance estimator (Liang and Zeger Biometrika): 1 V J J J cluster ((ˆα, ˆβ) X) = X j X j X j ˆɛ j ˆɛ j X j X j X j j=1 j=1 where X j = (1 m j, T j ) and ˆɛ j = (ˆɛ 1j,..., ˆɛ mj j) Consistent as the number of clusters goes to infinity Bias adjustment a.k.a. CR2 (Bell and McCaffrey Surv. Methodol.); meat = j=1 J X j (I mj P Xj ) 1/2ˆɛ j ˆɛ j (I mj P Xj ) 1/2 X j j=1 where P Xj = X j (X X) 1 X j Degree of freedom adjustments are also available Implication: cluster standard errors by the unit of treatment assignment Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18 1

15 Effects of Clientalistic Campaign ( Wantchekon World Politics) Does clientalism affect voting behavior? Experiment involving 4 main candidates during the 2001 presidential election in Benin 8 non-competitive districts (out of 84) are selected Random assignment within each district: 1 one village: clientalistic campaign 2 one village: public policy campaign 3 the other villages: control group Many experimental villages were at least 25 miles apart: The risk of contagion was thereby minimized Post-election survey of 2344 registered voters to measure vote choice Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

16 Stratified Cluster Randomized Design Reg. Sample Sample Population District Candidate Exp. Villages Voters Size Mean Mean Kandi Kerekou clientelist (0) 0.81 public policy (.50) 0.60 control (.18) 0.75 Nikki Kerekou clientelist (.21) 0.90 public policy (.24) 0.85 control (.20) 0.82 Bembereke Lafia clientelist (.26) 0.94 public policy (.30) 0.93 control (.28) 0.74 Perere Lafia clientelist (.42) 0.81 public policy (.33) 0.25 control (.40) 0.58 Abomey Soglo clientelist (.13) 0.91 public policy (.13) 0.90 control (.15) 0.86 Ouidah Soglo clientelist (.25) 0.86 public policy (.26) 0.72 control (0.44) 0.64 Aplahoue Amoussou clientelist (.13) 0.87 public policy (.28) 0.77 control (.20) 0.72 Dogbo Amoussou clientelist (.48) 0.65 public policy (.50) 0.47 control (0.44) 0.84 Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

17 Empirical Results Neyman public policy est s.e cluster assignment No No Yes Yes stratification No Yes No Yes clientalism est s.e cluster assignment No No Yes Yes stratification No Yes No Yes Regression public policy clientalism HC2 CR2 HC2 CR2 est s.e Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

18 Summary Simple linear regression Difference-in-means estimator Homoskedasticity implies the equal variance of potential outcomes Heteroskedasticity-robust variance estimator relaxes this assumption Cluster randomized experiments: 1 permits interference within cluster 2 less powerful when intracluster correlation is high 3 use cluster-robust standard variance estimator Suggested readings: 1 ANGRIST AND PISCHKE, Chapter 2 2 IMBENS AND RUBIN, Chapter 7 Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall / 18

Instrumental Variables

Instrumental Variables Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Noncompliance in Experiments Stat186/Gov2002 Fall 2018 1 / 18 Instrumental Variables