A Practitioner s Guide to Cluster-Robust Inference

Similar documents
Casuality and Programme Evaluation

Wild Bootstrap Inference for Wildly Dierent Cluster Sizes

Bootstrap-Based Improvements for Inference with Clustered Errors

Cluster-Robust Inference

Review of Econometrics

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Reliability of inference (1 of 2 lectures)

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Multiple Regression Analysis: Heteroskedasticity

11. Bootstrap Methods

Heteroskedasticity. Part VII. Heteroskedasticity

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

Econometrics of Panel Data

Inference Based on the Wild Bootstrap

Inference for Clustered Data

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

Least Squares Estimation-Finite-Sample Properties

Applied Statistics and Econometrics

BOOTSTRAPPING DIFFERENCES-IN-DIFFERENCES ESTIMATES

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models

QED. Queen s Economics Department Working Paper No. 1355

Heteroskedasticity-Robust Inference in Finite Samples

Introductory Econometrics

Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

WILD CLUSTER BOOTSTRAP CONFIDENCE INTERVALS*

What s New in Econometrics? Lecture 14 Quantile Methods

Robust Standard Errors in Small Samples: Some Practical Advice

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

A better way to bootstrap pairs

Wild Cluster Bootstrap Confidence Intervals

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Heteroskedasticity and Autocorrelation

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Clustering as a Design Problem

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Multiple Equation GMM with Common Coefficients: Panel Data

QED. Queen s Economics Department Working Paper No Bootstrap and Asymptotic Inference with Multiway Clustering

Inference in difference-in-differences approaches

Economics 582 Random Effects Estimation

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Homoskedasticity. Var (u X) = σ 2. (23)

Econ 582 Fixed Effects Estimation of Panel Data

Ordinary Least Squares Regression

Introductory Econometrics

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

FinQuiz Notes

Econometrics of Panel Data

Small-sample cluster-robust variance estimators for two-stage least squares models

LECTURE 10: MORE ON RANDOM PROCESSES

Applied Statistics and Econometrics

A Practitioner s Guide to Cluster-Robust Inference

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Multiple Regression Analysis

Lab 11 - Heteroskedasticity

Econometrics of Panel Data

Intermediate Econometrics

Iris Wang.

NBER WORKING PAPER SERIES ROBUST STANDARD ERRORS IN SMALL SAMPLES: SOME PRACTICAL ADVICE. Guido W. Imbens Michal Kolesar

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

We can relax the assumption that observations are independent over i = firms or plants which operate in the same industries/sectors

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT

matrix-free Hypothesis Testing in the Regression Model Introduction Kurt Schmidheiny Unversität Basel Short Guides to Microeconometrics Fall 2018

the error term could vary over the observations, in ways that are related

10 Panel Data. Andrius Buteikis,

Weak Instruments and the First-Stage Robust F-statistic in an IV regression with heteroskedastic errors

mrw.dat is used in Section 14.2 to illustrate heteroskedasticity-robust tests of linear restrictions.

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

2. Linear regression with multiple regressors

Multiple Regression Analysis

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Econometric Analysis of Cross Section and Panel Data

Motivation for multiple regression

Linear Panel Data Models

A Guide to Modern Econometric:

The Wild Bootstrap, Tamed at Last. Russell Davidson. Emmanuel Flachaire

Bootstrap Testing in Econometrics

Median Cross-Validation

Multiple Linear Regression

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics - 30C00200

New Developments in Econometrics Lecture 16: Quantile Estimation

ECON Introductory Econometrics. Lecture 16: Instrumental variables

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Lecture 1: OLS derivations and inference

Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

EMERGING MARKETS - Lecture 2: Methodology refresher

8. Instrumental variables regression

Statistical Inference with Regression Analysis

Introductory Econometrics

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Transcription:

A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20

In the previous episode Variance inflation τ k = 1 + ρ xk ρ u ( N g 1 ) (1) Cluster-Robust estimate of Variance Matrix (CRVE) V [ ˆβ] = ( X X ) 1 Bclu ( X X ) 1 B clu = G g =1 X g E [u g u g X g ] X g Assumption E [ u i g u j g x i g, x j g ] = 0 unless g = g Consistent as G Bias-Variance Trade-off Larger Clusters have less bias but higher variability Cameron Miller Cluster Clinic II March 4, 2015 2 / 20

In the previous episode Variance inflation τ k = 1 + ρ xk ρ u ( N g 1 ) (1) Cluster-Robust estimate of Variance Matrix (CRVE) V [ ˆβ] = ( X X ) 1 Bclu ( X X ) 1 B clu = G g =1 X g E [u g u g X g ] X g Assumption E [ u i g u j g x i g, x j g ] = 0 unless g = g Consistent as G Bias-Variance Trade-off Larger Clusters have less bias but higher variability Problems: Clusters not nested Few clusters Cameron Miller Cluster Clinic II March 4, 2015 2 / 20

Multi-way clustering Example: state-year panel clustering both within years and states Not possible to cluster at intersection of two groupings Solutions: Include sufficient regressors to eliminate error correlation in all but one dimension, and then do CRVE for remaining dimension Multi-way CRVE Cameron Miller Cluster Clinic II March 4, 2015 3 / 20

Multi-way CRVE Model Model: y i = x i β + u i Assumption: E [ u i u j x i, x j ] = 0 unless observations i and j share any cluster dimension Multi-way CRVE V [ ( ˆβ] = X X ) 1 ˆB ( X X ) 1 : ˆB = N N x i x j uˆ i uˆ j 1 [ i,j share any cluster ] i=1 j =1 Asymptotics relies on number of clusters of the dimension with the fewest number of clusters Cameron Miller Cluster Clinic II March 4, 2015 4 / 20

Multi-way CRVE, cont d Implementation Identify two ways of clustering and unique "group 1 by group 2 combinations" Estimate the model clustering on "group 1" ˆV 1 Estimate the model clustering on "group 2" ˆV 2 Estimate the model clustering on "group 1 by group 2" ˆV 1 2 ˆV 2way = ˆV 1 + ˆV 2 ˆV 1 2 S.e. are the square root of the diagonal elements of ˆV 2way Problems Perfectly multicollinear regressors ˆV 2way [ ˆβ] not guaranteed to be positive semi-definite Cameron Miller Cluster Clinic II March 4, 2015 5 / 20

Few clusters Wald t-statistics w = ˆβ β 0 s ˆβ If G, then w N(0,1) under H 0 : β = β 0 Finite G: unknown distribution of w. Common to use w T (G 1) Cameron Miller Cluster Clinic II March 4, 2015 6 / 20

Few clusters Wald t-statistics w = ˆβ β 0 s ˆβ If G, then w N(0,1) under H 0 : β = β 0 Finite G: unknown distribution of w. Common to use w T (G 1) Problem with Few Clusters Despite reasonable precision in estimating β, ˆV clu ( ˆβ) can be downwards-biased OLS leads to "overfitting", with estimated residuals too close to zero to be compared with true errors Use wrong critical values from N(0,1) or T (G 1) estimate of ˆB leads to over-rejection Cameron Miller Cluster Clinic II March 4, 2015 6 / 20

How few is "few"? No clear-cut definition of "few". Current consensus: Balanced clusters: from less than 20 to less than 50 clusters Unbalanced clusters: even more clusters Cameron Miller Cluster Clinic II March 4, 2015 7 / 20

Solutions Independent homoskedastic normally distributed errors Unbiased variance matrix of error variance s 2 = w T (N K ) û û N K Cameron Miller Cluster Clinic II March 4, 2015 8 / 20

Solutions Independent homoskedastic normally distributed errors Unbiased variance matrix of error variance s 2 = w T (N K ) û û N K Independent heteroskedastic normally distributed errors White s estimate is no longer unbiased No way to obtain exact T distribution result for w Partial solutions Bias-Corrected CRVE Cluster Bootstrap Improved Critical values using a T-distribution Cameron Miller Cluster Clinic II March 4, 2015 8 / 20

Solution 1: Bias-Corrected CRVE [ ] [ ] Need corrected residuals: E û g û g E u g u g ũ g = G G 1ûg ũ g = [ ] 1/2 ( I Ng H g g ûg, where H g g = X g X X ) 1 X g [ ] 1 ũ g = INg H g g ûg G 1 G Reduce but not eliminate over-rejection when there are few clusters Cameron Miller Cluster Clinic II March 4, 2015 9 / 20

Solution 2: Cluster Bootstrap Idea True size Wald test is 0.05 +O(G 1 ) Appropriate bootstrap with asymptotic refinement can obtain true size of 0.05 +O(G 3/2 ) Asymptotic refinement Achieved bootstrapping a statistics that is asymptotically pivotal: asymptotic distribution does not depend on any unknown parameter ˆβ not a.p. Wald t-statistics is a.p. Cameron Miller Cluster Clinic II March 4, 2015 10 / 20

Solution 2: Percentile-t Bootstrap Obtain B (999) draws, w1,..., w B from the distribution of the Wald t-statistics as follows: 1 Obtain G clusters {( y1, X 1 ) (,..., y G, X G)} by one cluster method (e.g. pairs cluster resampling) 2 Compute OLS estimate ˆβ b 3 Calculate the Wald test statistics w b = ˆβ b ˆβ s ˆβ b Cameron, Gelbach, and Miller (2008): Not eliminate test over-rejection Cameron Miller Cluster Clinic II March 4, 2015 11 / 20

Solution 2: Wild Cluster Bootstrap Idea Estimate main model imposing null hypothesis that you wish to test to give estimate of β H0 Example: test statistical significance of one OLS regressor. Regress y i g on all components of x i g except regressor with coefficient 0 Obtain residuals ũ i g = y i g x i g β H0 Cameron Miller Cluster Clinic II March 4, 2015 12 / 20

Solution 2: Wild Cluster Bootstrap, cont d Implementation 1 Obtain b th resample 1 Obtain ũ i g = y i g x β i g H0 { 1 w.p. 0.5 2 Randomly assign cluster g the weight d g = 1 w.p. 0.5 3 Generate new pseudo-residuals u i g = d g ũ i g and new outcome variables y i g = x i g β H0 + u i g 2 Compute OLS estimate ˆβ b 3 Calculate the Wald test statistics w b = ˆβ b ˆβ s ˆβ b Essentially Replacing y g in each resample with y g = X β H0 + ũ g or y g = X β H0 ũ g Obtain 2 G unique values of w 1,..., w B Cameron Miller Cluster Clinic II March 4, 2015 13 / 20

Solution 2: Wild Cluster Bootstrap, cont d w1,..., w B cannot be used directly to form critical values for a C.I. Used to provide a p-value for testing a hypothesis To form C.I. need to invert sequence of tests over a range of candidate H 0. 95 % C.I. is set of β 0 for whichp 0.05 Limitation G=5 16 possible values of w 1,..., w B Cameron Miller Cluster Clinic II March 4, 2015 14 / 20

Solution 2: Examine distribution of bootstrapped values Examination Summary statistics Sample size Largest and smallest value Histogram Missing values Examples If one cluster is an outlier: histogram might have a big mass that sits separately Dummy variable in RHS: obtain missing values for w because of low variation in treatment in some samples Huge ˆβ might be caused by bootstrapping drawing nearly multicollinear samples Cameron Miller Cluster Clinic II March 4, 2015 15 / 20

Solution 3: Improved critical values using a T-distribution At a minimum Use T(G-1) instead of N(0,1) For models with L regressors invariant within cluster: use T(G-L) Cameron Miller Cluster Clinic II March 4, 2015 16 / 20

Solution 3: Data-determined Degrees of Freedom Distribution w approximated by T (v ( ) ) v such that first two moments of v s equal the first two 2ˆβ/σ 2ˆβ moments of the χ(v ) v = ( ) 2 G λ j j =1 ( G λ 2 j j =1 λ j are eigenvalues of G ΩG G has g th column (I N H) g A g X g ( X X ) 1 ek H = X ( X X ) 1 X ) 1/2 A g = (I Ng H g g ) (2) e k is a vector of zeros aside from 1 in the k th position if ˆβ = ˆβ k Cameron Miller Cluster Clinic II March 4, 2015 17 / 20

Solution 3: Approximate w using effective number of clusters Distribution w approximated by T (G ) δ measures cluster heterogeneity G = G 1 + δ (3) δ = 1 G γ g = e k G g =1 [ (γg γ) 2 γ 2 ] ( X X ) 1 Xg Ω g X g ( X X ) 1 ek Cameron Miller Cluster Clinic II March 4, 2015 18 / 20

Empirical example Individual-level cross section data Random "policy" variable generated: equal to one for one-half of states and zero for other half Placebo treatment should be statistically insignificant in 95 % of tests performed at significance 0.05 Montecarlo 1000 replications Generate a dataset by sampling with replacement states Draw states from 3 % sample of individuals within each state. Average of 40 observations per cluster Explore effect of the number of clusters Cameron Miller Cluster Clinic II March 4, 2015 19 / 20

Montecarlo exercise Cameron Miller Cluster Clinic II March 4, 2015 20 / 20

Montecarlo exercise G=50, ignoring clustering leads to great over-rejection but less variable Cameron Miller Cluster Clinic II March 4, 2015 20 / 20

Montecarlo exercise Use different critical values: improvement but stil over-rejection Cameron Miller Cluster Clinic II March 4, 2015 20 / 20

Montecarlo exercise Residual bias adjustment: perform best with fewer clusters Cameron Miller Cluster Clinic II March 4, 2015 20 / 20

Montecarlo exercise Data-determined df: improvement with more clusters Cameron Miller Cluster Clinic II March 4, 2015 20 / 20

Montecarlo exercise Bootstrap without asymptotic refinement: as 2-4 Cameron Miller Cluster Clinic II March 4, 2015 20 / 20

Montecarlo exercise Bootstrap with asymptotic refinement: mild over-rejection Cameron Miller Cluster Clinic II March 4, 2015 20 / 20