Weak Instruments and the First-Stage Robust F-statistic in an IV regression with heteroskedastic errors

Similar documents
IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Lecture 11 Weak IV. Econ 715

Weak Instruments in IV Regression: Theory and Practice

Testing for Weak Instruments in Linear IV Regression

A Robust Test for Weak Instruments in Stata

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

A Practitioner s Guide to Cluster-Robust Inference

Robust Two Step Confidence Sets, and the Trouble with the First Stage F Statistic

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Specification Test for Instrumental Variables Regression with Many Instruments

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments

Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments

Lecture 4: Heteroskedasticity

Exogeneity tests and weak identification

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Likelihood Ratio based Joint Test for the Exogeneity and. the Relevance of Instrumental Variables

Heteroskedasticity and Autocorrelation

Specification testing in panel data models estimated by fixed effects with instrumental variables

Lecture 8: Instrumental Variables Estimation

ECON Introductory Econometrics. Lecture 16: Instrumental variables

8. Instrumental variables regression

Introductory Econometrics

Reliability of inference (1 of 2 lectures)

Instrumental Variables Estimation in Stata

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

What s New in Econometrics. Lecture 13

the error term could vary over the observations, in ways that are related

Casuality and Programme Evaluation

Symmetric Jackknife Instrumental Variable Estimation

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract

Multiple Regression Analysis

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

A CONDITIONAL LIKELIHOOD RATIO TEST FOR STRUCTURAL MODELS. By Marcelo J. Moreira 1

Least Squares Estimation-Finite-Sample Properties

Linear dynamic panel data models

A more powerful subvector Anderson and Rubin test in linear instrumental variables regression. Patrik Guggenberger Pennsylvania State University

Econometrics I. Ricardo Mora

On the Stock-Yogo Tables

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Applied Statistics and Econometrics

Econ 510 B. Brown Spring 2014 Final Exam Answers

Introduction to Econometrics. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity

Review of Econometrics

Economics 582 Random Effects Estimation

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Instrumental Variables and the Problem of Endogeneity

Estimation of Time-invariant Effects in Static Panel Data Models

Heteroskedasticity-Robust Inference in Finite Samples

Econometrics Summary Algebraic and Statistical Preliminaries

Econ 582 Fixed Effects Estimation of Panel Data

Econometrics of Panel Data

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Multiple Equation GMM with Common Coefficients: Panel Data

A more powerful subvector Anderson Rubin test in linear instrumental variables regression

Imbens/Wooldridge, Lecture Notes 13, Summer 07 1

More efficient tests robust to heteroskedasticity of unknown form

Selective Instrumental Variable Regression

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

A New Paradigm: A Joint Test of Structural and Correlation Parameters in Instrumental Variables Regression When Perfect Exogeneity is Violated

Simultaneous Equations and Weak Instruments under Conditionally Heteroscedastic Disturbances

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Introductory Econometrics

Bootstrap Testing in Econometrics

Homoskedasticity. Var (u X) = σ 2. (23)

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Simple Linear Regression: The Model

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics of Panel Data

Introductory Econometrics

Empirical Economic Research, Part II

Topic 7: Heteroskedasticity

Multiple Regression Analysis: Heteroskedasticity

Economics 583: Econometric Theory I A Primer on Asymptotics

Just How Sensitive are Instrumental Variable Estimates?

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

Cluster-Robust Inference

Instrumental Variables Regression, GMM, and Weak Instruments in Time Series

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

A Significance Test for the Lasso

Weighted Least Squares

ECONOMICS BOOTSTRAP METHODS FOR INFERENCE WITH CLUSTER-SAMPLE IV MODELS

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

ECON3150/4150 Spring 2016

GMM, Weak Instruments, and Weak Identification

Econometrics of Panel Data

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

OPTIMAL TWO-SIDED INVARIANT SIMILAR TESTS FOR INSTRUMENTAL VARIABLES REGRESSION DONALD W. K. ANDREWS, MARCELO J. MOREIRA AND JAMES H.

1 Estimation of Persistent Dynamic Panel Data. Motivation

F9 F10: Autocorrelation

ECON The Simple Regression Model

Transcription:

Weak Instruments and the First-Stage Robust F-statistic in an IV regression with heteroskedastic errors Uijin Kim u.kim@bristol.ac.uk Department of Economics, University of Bristol January 2016. Preliminary, please do not quote. Abstract We consider heteroskedastic errors in an IV regression with one endogenous variable and analyze their effect on the relationship between the robust first-stage F-statistic and the relative bias of the 2SLS estimator, relative to that of the OLS estimator, and the Wald test size. We follow the definitions of weak instruments and the testing procedure of Stock and Yogo (2005). We show that under heteroskedasticity, the weak instruments limiting distribution of the robust F-statistic is a noncentral chi-square distribution like the standard F-statistic as derived in Staiger and Stock (1997). However, the relative IV bias depends on the correlation of the structural and first-stage equation errors unlike in the homoskedastic case considered by Stock and Yogo (2005). We introduce specific forms of heteroscedasticity and investigate the associated magnitudes of the relative biases and compare them to those from Stock and Yogo, given the strength of instruments. We find that, depending on the specific type of heteroskedasticity, the relative bias can be larger than that under homoskedasticity. This means that in order to attain the same level of relative bias, stronger instruments are needed under the specific form of heteroskedasticity, and hence critical values for the F-test are larger. This implies that the approach for testing for weak instruments that compares the robust F-statistic to the Stock-Yogo critical values can result in the null of weak instruments being rejected too frequently. An alternative F-test procedure has been developed by Olea and Pflueger (2013), specifically designed to test for weak instruments under general forms of heteroskedasticity. This test is based on a maximum Nagar bias criterion. Our results show that the Olea-Pflueger test can reject the null of weak instruments even when there is a substantial relative bias of the IV estimator. This happens when the Nagar bias is poor approximations to the bias of the 2SLS estimator. Acknowledgement: I would like to thank my supervisor Frank Windmeijer for helpful comments and suggestions. 1

1 Introduction In this paper, we consider an IV regression model with one endogenous regressor and k z instruments of the structural and the first-stage reduced form equations, which is given by for i = 1,..., n with sample size n. y i = x i β + u i x i = z iπ + v i (1) β is the structural parameter of interest and π is the first-stage parameter vector of order k z. It is assumed that there is endogeneity such that Eu i x i 0. A common approach to counter the endogeneity problem in a regression is to use instrumental variables. In this application, there is a well-known issue of weak instruments. Loosely speaking, the weak instruments problem arises when the included endogenous variables are only weakly correlated with instruments considered. In the presence of weak instruments, inference based on the 2SLS estimator can be unreliable as the standard error of the 2SLS estimator is small relative to its bias (Staiger and Stock, 1997) and its limiting distribution is non-normal (Stock, Wright and Yogo, 2002). In addition, the problem is not confined to small samples. It can be shown even in a very large sample, see Bound et al. (1995). Conventional asymptotic approximations to finite-sample distributions which are obtained in a fixed model as sample size goes to infinity can be poor in the presence of weak instruments, even in a large sample (Stock, Wright and Yogo, 2002). To have good approximations for IV estimators such as the 2SLS estimator, alternative asymptotic methods are used. As one of the alternative methods, weak instrument asymptotics is introduced by Staiger and Stock (1997), which is local to zero. Under weak instrument asymptotics, Staiger and Stock derive the limiting distribution of the first-stage F-statistic which follows a noncentral chi-square distribution. Based on their findings, Stock and Yogo (2005) propose a weak instrument test using the first-stage F-statistic with the definitions of weakness of instruments based on the relative bias of the 2SLS estimator relative to that of the OLS estimator and the Wald test size distortions. This weak instruments test is commonly used to detect whether the instruments considered are weak or not. All the findings and the weak instrument test of Stock and Yogo mentioned above are under the assumption of homoskedastic errors. If the homoskedastic error assumption is violated, they are not valid any more. The weak instrument test from Stock and Yogo is still commonly implemented for heterosekdasticity, serially correlation and clustering, reporting a robust first-stage F-statistic 2

instead of the standard first-stage F-statistic and comparing it with the critical values compiled by Stock and Yogo because there has not been much research analyzing the effect of heteroskedasticity on the weak instruments test. As a paper dealing with heteroskedasticity, Bun and De Haan (2010) analyze the standard and robust version of the first-stage F-statistic and derive approximations to the bias of 2SLS and OLS estimators under non-scalar covariance structure of clustered or serially correlated data. In addition, a test for weak instruments robust to heteroskedasticity, serially correlation and clustering is proposed by Olea and Pflueger (2013). The test is based on an Effective F-statistic which is a scaled version of the non-robust F-statistic with effective degrees of freedom. In this paper, we analyze the weak instruments test under heteroskedasticity in the model (1). Heteroskedasticity is specified in three cases: the first case is where the conditional variance of the first-stage equation errors depends on a linear combination of squared instruments, the next is where it is an exponential function of a linear combination of instruments and the last is where the conditional covariance structure of the structural and first-stage equation errors is a linear combination of instruments. For the first two cases, heteroskedasticity is imposed only on the first-stage errors and instruments are assumed to be normally distributed as continuous ones and with regard to the last case, a categorical instrument as discrete one is considered like in Andrews (2014), represented by a group of mutually exclusive dummy variables. We investigate the limiting relative bias of the 2SLS estimator, relative to that of the OLS estimator, the limiting Wald test statistic, and the heteroskedasticity robust first-stage F-statistic with a general form of heteroskedasticity on the structural and first-stage equation errors under weak instrument asymptotics. We show that the limiting distribution of the heteroskedasticity robust F-statistic follows a noncentral chi-square distribution like that of the standard F-statistic, which is derived by Staiger and Stock (1997). As heteroskedasticity cases, we specify the conditional covariance structures according to each heteroskedasticity type mentioned above and in turn derive specific forms for the limiting distributions of the relative bias, the heteroskedasticity robust first-stage F-statistic and the Wald test statistic. In Stock and Yogo, the relative bias is determined regardless of the correlation between the errors, while it depends on the correlation between the errors when heteroskedasticity is introduced regardless of the types considered as above. For the relative bias or the Wald size to be a certain value, say 10%, we calculate the critical values for a test of weak instruments using the fact that the limiting 3

distribution of the robust F-statistic is a noncentral chi-square distribution. For two cases taken into account where heteroskedasticity is imposed only on the first-stage errors, the corresponding critical values are smaller than those of Stock and Yogo across the number of instruments, meaning that if the critical values from Stock and Yogo are used for weak instruments test with the robust F-statistic under these heteroskedasticity types, the test results are conservative in the sense that the null of weak instruments is rejected too infrequently. On the other hand, when a categorical instrument is considered and it is assumed that the conditional covariance structure depends on a linear combination of instruments, the critical values are likely to be larger than those from Stock and Yogo, indicating that under this specific type of heteroskedasticity the test results for weak instruments using the robust first-stage F-statistic and the critical values of Stock and Yogo can be misleading about the strength of instruments such that the null of weak instruments is rejected too frequently. In consequence, heteroskedasticity can affect a test for weak instruments and we should be cautious about a practical approach in the weak instruments test under heteroskedasticity following the Stock and Yogo testing procedure with just replacing the standard F-statistic with the heteoskedasticity robust one. This is because depending on how heteroskedasticity is specified, there is possibility that the strength of instruments considered are overestimated as the null of weak instruments is rejected too frequently. With regard to the Olea and Pflueger (2013) weak instruments test, the test results are quite conservative in the sense that the null of weak instruments is rejected too infrequently, compared to the weak instruments test following the Stock and Yogo testing procedure under heteroskedasticity only on the first-stage errors. The comparison of the Nagar bias of the 2SLS estimator, relative to the benchmark and the relative bias of the estimator as defined in Stock and Yogo shows that the Nagar bias can also be good approximations to the bias of the estimator as they have similar pattern across the concentration parameter under this kind of heteroskedasticity. However, when a categorical instrument is considered like in Andrews (2014) and the conditional covariance structure of the errors is a function of instruments, the approximations of the Nagar bias to the bias of the estimator can be poor as the Nagar bias is far smaller than the relative bias when the relative bias is huge. This can be shown in the Olea and Pflueger test results that there are cases where the null of weak instruments is rejected too frequently, compared to the weak instruments test following the Stock and Yogo testing procedure under this kind of heteroskedasticity, meaning that the strength of instruments can be overestimated. 4

This paper is organised as follows. In section 2, we review the weak instrument test from Stock and Yogo (2005) under homoskedastic errors. Section 3 discusses the weak instrument test under heteroskedastic errors including the relative bias, the heteroskedasticity robust firststage F-statistic and the Wald test statistic. In section 4, three types of heteroskedastic errors are introduced and we derive specific forms of the limiting distributions of the elements for the weak instruments test above and calculate the critical values for a test of weak instruments according to each type of heteroskedasticity considered. We also provide simulation results for the weak instruments test based on the theoretical results. Finally, Section 5 concludes. 2 Weak Instrument Measure under Homoskedastic Errors Stock and Yogo (2005) propose a weak instrument test in an IV regression model with multiple endogenous variables under homoskedastic errors. They define weak instruments using the relative bias of IV estimators and the Wald size distortions. The test is implemented by comparing the test statistic based on the Cragg-Donald (1993) statistic and the critical values calculated using the fact that the the limiting distribution of Cragg-Donald statistic is a noncentral Wishart distribution under the weak instrument asymptotics introduced by Staiger and Stock (1997) for a certain value of the relative bias and the Wald size. To illustrate, we consider the model (1). Stacking observations we have y = xβ + u x = Zπ + v where y, x, u and v are n 1 vectors, Z is a n k z matrix. For u and v, it is assumed that Var u = Σ I n v where Σ = σ2 u σ uv σ uv σ 2 v. The concentration parameter as a measure of the strength of instruments is given by µ = π Z Zπ σv 2, 5

see Rothenberg (1984). When it is evaluated using OLS, its estimator divided by k z is the first-stage F-statistic to test the null H 0 : π = 0, which is given by F = 1 k z ˆπ Z Zˆπ ˆσ 2 v = ˆµ k z where ˆπ = (Z Z) 1 Z x and ˆσ 2 v = (x Zˆπ) (x Zˆπ)/n. Stock and Yogo (2005) define weak instruments in twofold: one is where a group of instruments are weak when the relative bias of IV estimators such as the 2SLS estimator relative to that of the OLS estimator exceeds some value, say 10%. The other is where a set of instruments is weak if the Wald test size at a α% nominal level using IV estimators is larger than a certain threshold, say 10%. The weak instrument asymptotics as local to zero is introduced by Staiger and Stock (1997), which ensures that the concentration parameter does not increase as sample size increases. It is given by π = c/ n where c is a k z -dimensional vector of constants. Under the weak instrument asymptotics, the concentration parameter is expressed as where Q ZZ = plim(z Z/n) = E(z i z i ). µ = π Z Zπ σ 2 v p c Q ZZ c σv 2 To derive the limiting distribution of the bias of the 2SLS estimator relative to that of the OLS estimator under weak instrument asymptotics, it is assumed that conditions are satisfied such that 1 n Z u d ψ Zu N(0, Σ Q ZZ ) 1 n Z v ψ Zv Under weak instrument asymptotics, Staiger and Stock (1997) show that where ˆβ 2SLS β = x P Z u x P Z x ˆβ OLS β = x u x x d σ u σ v p σ uv σ 2 v (λ + z v ) z u (λ + z v ) (λ + z v ) = σ u σ v σ uv σ u σ v P Z = Z(Z Z) 1 Z ; λ = σ 1 v Q 1/2 ZZ c; z u = σ 1 u Q 1/2 ZZ ψ Zu; z v = σ 1 v Q 1/2 ZZ ψ Zv. 6

Based on the derivations above, the relative bias is given by (B n ) 2 = = ( E( ˆβ 2SLS β) ( E as Ez u z v = σuv σ uσ v z v. In absolute term, we have B n = = E( ˆβ OLS β) ) 2 (λ + z v ) z v (λ + z v ) (λ + z v ) E( ˆβ 2SLS β) E( ˆβ OLS β) E (λ + z v ) z v (λ + z v ) (λ + z v ) which is constant for all values of the correlation between the errors. As z v N(0, I kz ), it depends on λ and k z only. ) 2 Under the weak instrument asymptotics, the F-statistic can be expressed as a function of λ, which is given by F = 1 k z ˆπ Z Zˆπ ˆσ 2 v = ˆµ p 1 c Q ZZ c k z k z σv 2 = λ λ k z The magnitude of the relative bias can therefore be assessed by the first-stage F-statistic under the weak instrument asymptotics (Stock and Yogo, 2005) in a way that a small F-statistic indicates a small concentration parameter leading to large relative bias of the 2SLS estimator, meaning that instruments considered are weak. Staiger and Stock (1997) show that the limiting distribution of the first-stage F-statistic under the weak instrument asymptotics is a noncentral chi-squared distribution, which is given by F d χ 2 k z (λ λ)/k z. where χ 2 a(b) is a noncentral chi-squared distribution with degrees of freedom a and noncentrality parameter b. Thus, the F-statistic can be used to test the hypothesis, which is given by H 0 : µ/k z Λ b vs H 1 : µ/k z > Λ b where Λ b is a value for the B n to be equal to some value, say 10%, which is obtained from the fact that the limiting distribution of the F-statistic is a noncentral chi-squared distribution. When the null is rejected, it can be said that instruments in a model are not weak. 7

The Wald test for testing the null H 0 : β = β 0 is given by W = ( ˆβ 2SLS β 0 ) 2 (x P Z x) ˆσ 2 u where ˆσ 2 u = (y x ˆβ 2SLS ) (y x ˆβ 2SLS )/n. Staiger and Stock (1997) show that where W d v2 2/v 1 1 σuv σ uσ v (v 2 /v 1 ) + (v 2 /v 1 ) 2 v 1 = (λ + z v ) (λ + z v ); v 2 = (λ + z v ) z u. The Wald size is maximized when the correlation between the errors is equal to 1. Stock and Yogo (2005) calculate critical values such that the maximal Wald size is equal to a certain value, say 10% which are used to test the hypothesis of H 0 : µ/k z Λ w vs H 1 : µ/k z > Λ w where Λ w is a value for the Wald size to be equal to some value, say 10%. 3 Weak Instrument Measure under Heteroskedastic Errors In this section we allow for heteroskedastic errors in the model (1). Following the definition of weak instruments from Stock and Yogo (2005), we explore the effect of the introduction of heteroskedastic errors on the relative bias of the 2SLS estimator and the Wald test size distortions. Heteroskedastic errors are specified in three ways: for continuous instruments, the conditional variance of the first-stage errors is a linear combination of squared instruments or an exponential function of a linear combination of instruments. In addition, for a categorical instrument like in Andrews (2014), the conditional covariance structure of the errors are a linear combination of dummy variables which represent a categorical instrument. The relative bias under these specific error types varies according to the covariance structure of the errors, while the bias under homoskedastic errors in Stock and Yogo (2005) is not a function of the correlation between them. We consider heteroskedastic errors in model (1). Suppose that (y i, x i ) is the i th observation of an i.i.d sample with size n. For u i and v i, we assume that u i z i (0, Ω); Ω = σ2 u(z i ) σ uv (z i ) σ uv (z i ) σ v (z i ) v i 8

in the sense that heteroskedasticity on u i and v i, conditional on z i, is allowed. To derive the limiting relative bias of the 2SLS estimator relative to that of the OLS estimator, it is assumed that conditions are satisfied, such that 1 n Z u d ψ Zu N(0, Ω Z ) (2) 1 n Z v ψzv Ω Z = Ω z, u Ω z, uv = E(u2 i z iz i ) E(u iv i z i z i ) E(u i v i z i z i ) E(v2 i z iz i ) Ω z, uv Ω z, v where z i is a k z -dimensional vector of instruments for observation i. 3.1 Robust First-Stage F-statistic Under heteroskedastic errors, the heteroskedasticity robust first-stage F-statistic which test the hypothesis H 0 : π = 0 is given by F r = ˆπ var(ˆπ) ˆ 1ˆπ/k z = x Z( = (Zπ + v) Z( n ˆv i 2 z i z i) 1 Z x/k z i=1 n ˆv i 2 z i z i) 1 Z (Zπ + v)/k z i=1 n = (n 1/2 Z Zπ + n 1/2 Z v) (n 1 ˆv i 2 z i z i) 1 (n 1/2 Z Zπ + n 1/2 Z v)/k z i=1 Under weak instrument asymptotics, it follows that n F r = (n 1 Z Z c + n 1/2 Z v) (n 1 ˆv i 2 z i z i) 1 (n 1 Z Z c + n 1/2 Z v)/k z d i=1 (Q ZZ c + ψ Zv) Ω 1 z,v(q ZZ c + ψ Zv)/k z = (λ + z v) (λ + z v)/k z. (3) where λ = Σ 1/2 z,v Q 1/2 ZZ c ; z v = Σ 1/2 z,v Q 1/2 ZZ ψ Zv. See the Appendix for the details of the derivations. As zv is normally distributed with mean 0 and covariance I kz, (λ + zv) (λ + zv) follows a noncentral chi-squared distribution with noncentrality parameter λ λ. The limiting distribution of the robust F-statistic is thus given by F r d χ 2 kz (λ λ )/k z 9

3.2 Relative Bias The 2SLS estimator of β is given by ˆβ 2SLS = x P Z y x P Z x = β + x P Z u x P Z x where P Z = Z(Z Z) 1 Z. Under heterosekdastic errors, the limiting distribution of the difference of the 2SLS estimator from β can be rewritten as, using weak instrument asymptotics, where ˆβ 2SLS β = x P Z u x P Z x (Zπ + v) Z(Z Z) 1 Z u = (Zπ + v) Z(Z Z) 1 Z (Zπ + v) = (n 1 Z Z c + n 1/2 Z v) (n 1 Z Z) 1 (n 1/2 Z u) (n 1 Z Z c + n 1/2 Z v) (n 1 Z Z) 1 (n 1 Z Z c + n 1/2 Z v) d (λ + zv) Σz,v 1/2 Σz,u 1/2 zu (λ + z v) Σ z,v (λ + z v) Σ z,v = Q 1/2 ZZ Ω z,vq 1/2 ZZ ; Σ z,u = Q 1/2 ZZ Ω z,uq 1/2 ZZ ; λ = Σ 1/2 z,v Q 1/2 ZZ c ; z v = Σ 1/2 z,v Q 1/2 ZZ ψ Zv ; z u = Σ 1/2 z,u Q 1/2 ZZ ψ Zu. For the details of the derivations, see the Appendix. If u is homoskedastic, Σ z,u simplifies to σ 2 ui kz. Similar to those in Stock and Yogo (2005), z u and z v are standard multinormals. The difference of the OLS estimator from its parameter under weak instrument asymptotics (Staiger and Stock, 1997) is given by where σ uv = E(u i v i ) and σ 2 v = E(v 2 i ). ˆβ OLS β = x u x x p σ uv σ 2 v Thus, the relative bias of the 2SLS estimator, relative to that of the OLS estimator under heteroskedasticity on u and v is given by E( B n = ˆβ 2SLS ) β E( ˆβ OLS ) β = E σv 2 (λ + zv) Σ 1/2 z,v Σz,u 1/2 z u σ uv (λ + zv) Σ z,v (λ + zv) (4) 10

3.3 Wald size Under weak instrument asymptotics, the limiting distribution of x P Z u is given by as derived in section 3.2 with the same notation as above. x P Z u d (λ + z v) Σ 1/2 z,v Σ 1/2 z,u z u Regarding var( ˆ ˆβ 2SLS ), it is given by var( ˆ ˆβ 2SLS ) = ( n ) x Z(Z Z) 1 û 2 i z iz i (Z Z) 1 Z x i=1 (x P Z x) 2. Firstly looking at n 1/2 Z x, we have under weak instrument asymptotics n 1/2 Z x = n 1/2 Z (Zπ + v) = n 1 Z Zc + n 1/2 Z v d Q ZZ c + ψ Zv where Q ZZ = plim(n 1 Z Z) = E(z i z i ). Thus ( n ) x Z(Z Z) 1 û 2 i z i z i (Z Z) 1 Z x = (n 1/2 x Z)(n 1 Z Z) 1( n ) n 1 û 2 i z i z i (n 1 Z Z) 1 (n 1/2 Z x) i=1 d See the Appendix for the detailed derivations. i=1 (Q ZZ c + ψ Zv) Q 1 ZZ Ω z,uq 1 ZZ (Q ZZc + ψ Zv) = (λ + z v) Σ 1/2 z,v Σ z,u Σ 1/2 z,v (λ + z v). Let v1 H = Σ1/2 z,v (λ + zv) and v2 H = (λ + zv) Σz,v 1/2 Σ 1/2 z,u zu. Based on the derivations above, the Wald test statistic to test H 0 : ˆβ2SLS = β 0 is given by W = ( ˆβ 2SLS β 0 ) 2 var( ˆ ˆβ 2SLS ) (x P Z u/x P Z x) 2 = ( n ) x Z(Z Z) 1 û 2 i z iz i (Z Z) 1 Z x/(x P Z x) 2 i=1 = d (x P Z u) ( 2 n ) x Z(Z Z) 1 û 2 i z iz i (Z Z) 1 Z x i=1 (v H 2 )2 v H 1 Σ z,uv H 1 (5) 11

For the case in which heteroskedasticity is imposed only on v i, the limiting distributions of x P Z u and x P Z x under weak instrument asymptotics are given by x P Z u x P Z x d σ u (λ + z v) Σ 1/2 z,v z u d (λ + z v) Σ z,v (λ + z v) using the same notation as above. Let v h 1 = (λ +z v) Σ z,v (λ +z v) and v h 2 = (λ +z v) Σ 1/2 z,v z u. Thus x P z u d σ u v h 2, x P Z x d v h 1 and ˆβ 2SLS β = σ u v h 2 /vh 1. Similarly, for the estimator of the variance of u i, it can be shown that ˆσ 2 u = (y x ˆβ 2SLS ) (y x ˆβ 2SLS )/n = u x( ˆβ 2SLS β) u x( ˆβ 2SLS β)/n = n 1 u u 2( ˆβ 2SLS β)(n 1 π Z u + n 1 v u) + ( ˆβ 2SLS β) 2 (n 1 π Z Zπ + 2n 1 π Z v + n 1 v v) Under weak instrument asymptotics, it follows using the definitions above ˆσ 2 u = n 1 u u 2( ˆβ 2SLS β)(n 3/2 c Z u + n 1 v u) + ( ˆβ 2SLS β) 2 (n 2 c Z Zc + 2n 3/2 c Z v + n 1 v v) d σ 2 u(1 2 σ uv σ u v h 2 v h 1 + σ 2 v ( ) v h 2 2 ). v h 1 Combining the derivations above for x P z u, x P Z x and ˆσ 2 u, the Wald test statistic to test H 0 : ˆβ 2SLS = β 0 is given by W = ( ˆβ 2SLS β 0 ) 2 var( ˆ ˆβ 2SLS ) = (x P z u) 2 /x P z x ˆσ 2 u d (v h 2 )2 /v h 1 1 2 (σ uv /σ u )(v h 2 /vh 1 ) + σ2 v (v h 2 /vh 1 )2 (6) 4 Specific Heteroskedasticity Cases We specify heteroskedastic errors as three cases. For the first two cases, we assume that instruments are normally distributed as continuous ones, which is given by z i N(0, I kz ). Even though it is a strong assumption, it facilitates to capture the effect of the introduction of heteroskedasticity on errors through the relatively simple moments of z i and accessibility to 12

known distributions derived from a normal distribution. In addition, the first-stage error v i are generated such that v i = ρu i + f(z i )ɛ i where u i and ɛ i are independently drawn from the standard normal distribution, ρ is the covariance of u i and v i, z i is k z instruments and f(z i ) is a function of z i. This generates heteroskedasticity only on v i. With regard to the first case, we specify f(z i ) as f(z i ) = kz γ j zij 2 where γ j for j = 1,..., k z is a constant and non-negative which makes sure the variance of v i is positive. In this setting, the conditional variance of v i is a linear combination of squared instruments. For the second case, we set f(z i ) as f(z i ) = exp(z iγ/2) where γ is k z -dimensional vector of constants whose entries are non-negative. This structure leads the conditional variance of v i is an exponential function of a linear combination of instruments. As the last case, we consider a categorical instrument as discrete one like in Andrews (2014). A categorical instrument is represented by a group of dummy variables which are mutually exclusive such that z ij = 0 or 1 for all j, z ij = 1 and z ij is realized with the same probability for all j, which is given by p(z ij ) = p = 1/k z. In addition, the structural and first-stage equation errors are generated such that, when the structural parameter of interest β = 0, u i 1/2 = Σ v (z i ) ɛ 1,i v i ɛ 2,i and when β 0 where u i = 1 β 1/2 Σ v (z i ) ɛ 1,i 0 1 v i Σ v (z i ) = z i γ z i α z i α z i δ ; ɛ 2,i ɛ 1,i N(0, I 2 ) ɛ 2,i with I 2, an identity matrix of order 2. This leads the conditional covariance structure of the errors is a linear combination of this kind of instrument. 13

When heteroskedasticity is imposed only on v i, the error structure simplifies to var u i 1 ρ u i ; var z i 1 ρ v i ρ ρ E(vi 2 z i) σ 2 v where σ 2 u = 1 for simplicity and E(u i v i ) = ρ. In addition, this heteroskedasticity leads Ω to be Ω = Ω z,u Ω z,uv Ω z,uv Ω z,v v i = I k z using the normality assumption on z i where Q ZZ = E(z i z i ) = I k z. Referring to the previous definition, the relative bias under this kind of heteroskedasticity can be rewritten as B n = E σv 2 (λ + zv) Σ 1/2 z,v Σz,u 1/2 z u σ uv (λ + zv) Σ z,v (λ + zv) = E σ v(λ 2 + zv)ω 1/2 Z,v z u ρ(λ + zv) Ω z,v (λ + zv). ρi kz ρi kz Ω z,v As Ez u zv = ρω 1/2 z,v zv, we have B n = E σv(λ 2 + zv)z v (λ + zv) Ω z,v (λ + zv) (7) 4.1 Variance as Function of Squared Instruments As mentioned before, we generate the errors in the model (1) such that v i = ρu i + kz γ j zij 2 ɛ i. Due to the normality assumption on z i, we have the unconditional variance of v i such that σv 2 = E(vi 2 ) = E Z E(v 2 i z i ) = E Z E ( ρu i + kz ) 2 γ j zij 2 ɛ i z i = E Z ρ 2 + γ j zij 2 = ρ 2 + γ j, which is a function of the covariance between the structural and first-stage equation errors and the coefficients generating heteroskedasticity. 14

Let Ω q z,v be Ω z,v referring to the previous definition for this specific type of heteroskedasticity. Using the normality assumption on z i and the unconditional variance of v i, Ω q z,v simplifies to Ω q z,v = E ( ) ρ 2 + γ j zij 2 z i z i = ρ 2 I kz + E Z ( γ j zij)z 2 i z i = ( ρ 2 I kz + γ j )I kz + 2L = σ 2 v I kz + 2L (8) where L is a diagonal matrix with the j th diagonal element γ j. See the appendix for the details of the derivation. Based on the derivations above, the relative bias is given by B n = E σv(λ 2 + Zv ) Zv (λ + Zv ) Ω q z,v(λ + Zv (9) ) where σv 2 = ρ 2 + γ j ; Ω q z,v = σv 2 I kz + 2L ; λ = (Ω q z,v) 1/2 c ; Z v = (Ω q z,v) 1/2 ψ Zv. The relative bias of Stock and Yogo (2005), which is derived under homoskedastic errors, is determined regardless of ρ, while ρ plays a role on the relative bias under this type of heteroskedasticity on the errors through the fact that the unconditional variance of v i is a function of ρ and the coefficients γ. In addition, B n can be rewritten as B n = E (λ + zv) zv (λ + zv) (Ω q z,v/σv)(λ 2 + zv). Given λ and z v, if Ω q z,v/σ 2 v I kz is positive definite, it leads to smaller relative bias under this type of heteroskedasticity than that under homoskedastic errors. Ω q z,v/σ 2 v I kz is given by Ω q z,v σ 2 v I kz = σ2 vi kz + 2L σ 2 v I kz = 2 σ 2 v L. All the eigenvalues of L are non-negative because L is a diagonal matrix with non-negative values, γ j > 0, and σ 2 v is positive. Ω q z,v/σ 2 v I kz is thus positive definite unless all γ j for 15

j = 1,..., k z are zero, so the relative bias under this heteroskedasiticity is smaller than that from Stock and Yogo (2005). To obtain the same level of the relative bias under this kind of heteroskedasticity as that from Stock and Yogo, weaker instruments are needed, which leads to smaller values of the heteroskedasticity robust first-stage F-statistic and smaller critical values for the weak instrument test than those of Stock and Yogo. With regard to the Wald size, the Wald test statistic is given by W d = (v h 2 )2 /v h 1 1 2 (σ uv /σ u )(v h 2 /vh 1 ) + σ2 v (v h 2 /vh 1 )2 (v h 2 )2 /v h 1 1 2ρ(v h 2 /vh 1 ) + σ2 v (v h 2 /vh 1 )2 (10) where v h 1 = (λ + Z v ) Ω q z,v(λ + z v); v h 2 = (λ + z v) (Ω q z,v) 1/2 z u. The limiting Wald size under weak instrument asymptotics is a function of the correlation between the structural and first-stage errors, and increases as the correlation increases like Stock and Yogo (2005), while it is not maximized when the correlation is equal to 1 because the correlation cannot be equal to 1 in this setting. 4.2 Variance as Exponential Function of Instruments As discussed before, the errors in the model (1) under this kind of heteroskedasticity are generated as ( 1 ) v i = ρ u i + exp 2 z iγ ɛ i. The normality assumption on z i gives the unconditional variance of v i such that σv 2 = E(vi 2 ) = E Z E(v 2 i z i ) { ( 1 ) } 2 = E Z E ρu i + exp 2 z iγ ɛ i z i = E Z ρ 2 + exp(z iγ) ( ) 1 = ρ 2 + exp 2 γ γ. σ 2 v is a function of ρ like the previous case. Let Ω exp z,v be Ω z,v for this specific case of heteroskedasticity. Under the normality assumption 16

on z i and the unconditional variance of v i, we have Ω exp Z,v such that ρ Ω exp z,v = E{ 2 + exp(z iγ) } z i z i = ρ 2 I kz + E Z exp(z iγ)z i z i ( ) 1 = ρ 2 I kz + exp 2 γ γ (I kz + γγ ) ( ) 1 = ρ 2 + exp 2 γ γ I kz + exp ( 1 = σvi 2 kz + exp 2 γ γ See the Appendix for the detailed derivations. ( 1 2 γ γ ) γγ ) γγ. (11) Based on the derivations above, the relative bias is given by B n = E σv(λ 2 + zv) zv (λ + zv) Ω exp z,v (λ + zv) (12) where ( ) 1 σv 2 = ρ 2 + exp 2 γ γ ; Ω exp z,v = σv 2 I kz + 2L ; λ = (Ω exp z,v ) 1/2 c ; z v = (Ω exp z,v ) 1/2 ψ Zv. Like the case of heteroskedasticity in the previous section, ρ plays a role in the relative bias through the fact that the unconditional variance of v i is a function of ρ. Furthermore, B n can again be rewritten as B n = E (λ + zv) zv (λ + zv) (Ω exp z,v /σv)(λ 2 + zv) Given λ and z v, it holds that for every non-zero vector q with order k z q ( Ωexp z,v σv 2 I kz )q = q ( σ2 vi kz + exp(γ γ/2)γγ σ 2 v = q exp(γ γ/2) σv 2 γγ q = exp(γ γ/2) σv 2 q γγ q > 0. I kz )q Therefore, Ω exp z,v /σ 2 v I kz is positive definite unless all γ j are zero for j = 1,..., k z, so the relative bias under this heteroskedasticity is smaller than that from Stock and Yogo regardless of the non-negative values of γ which leads to smaller critical values for the weak instrument test than those of Stock and Yogo. 17

Regarding the Wald size, the Wald test statistic is given by W d (v h 2 )2 /v h 1 1 2 (σ uv /σ u )(v h 2 /vh 1 ) + σ2 v (v h 2 /vh 1 )2 where = (v h 2 )2 /v h 1 1 2ρ(v h 2 /vh 1 ) + σ2 v (v h 2 /vh 1 )2 (13) v h 1 = (λ + z v) Ω exp z,v (λ + z v); v h 2 = (λ + z v) (Ω exp z,v ) 1/2 z u. Like the previous case, the limiting Wald size under weak instrument asymptotics is a function of the correlation between the structural and first-stage errors, and increases as the correlation increases like Stock and Yogo (2005), while it is not maximized when the correlation is equal to 1 because the correlation cannot be equal to 1 in this setting. 4.3 Variance as Function of Categorical Instrument Andrews (2014) considers an IV regression model under heteroskedastic errors with a categorical instrument which is represented by a collection of dummy variables. The model with a single endogenous variable is given, in reduced form, y = Zπβ + v 1 x = Zπ + v 2 where β is a structural parameter, x, y, v 1 and v 2 are n-dimensional vectors, Z is a k z n matrix of instruments and π is the k z -dimensional first-stage parameter vector. It is assumed that a categorical instrument is represented by a set of dummy variables which are mutually exclusive such that z ij = 0 or 1 for all j, z ij = 1 and z ij is realized with the same probability for all j, which is given by p(z ij ) = p = 1/k z. As heteroskedasticity, the following conditional covariance structure of the errors on instruments is considered v 1,i v 2,i z i N(0, Σ v (z i )) where subscript i indicates an individual observation. In this setting, the simulations for confidence sets of the first-stage F-statistic under heteroskedastic errors as an indicator of measuring the strength of instruments, are performed with 18

β = 0 using different cut-offs such as the critical values of Stock and Yogo (2005) and 10 which is the rule of thumb for the strength of instruments proposed by Staiger and Stock (1997). The simulations show that under heteroskedastic errors following the covariance structure above, the coverage distortions are far larger than the nominal level 5%, which comes from larger values of the F-statistic than Stock and Yogo (2005). It means that because the relative bias under heteroskedastic errors is larger than that from Stock and Yogo, given a certain level of the relative bias under homoskedastic errors, stronger instruments are needed to attain the same level of the relative bias under this kind of heteroskedastic errors. Following the setting of Andrews (2014) using a categorical instrument, we also include constant terms in both equations and convert all the variables in the model into the deviations from their means. For simplicity, we set constant terms in both equations to be equal to 0 which leads ȳ and x to be zero, so y and x to be same as before the conversion. The model considered is given by y = xβ + u x = Z π + v where y, x, u and v are a n-dimensional vector, Z is a (k z 1) 1 matrix of instruments and π is a (k z 1) 1 first-stage parameter vector. We drop out the last dummy variable as reference group and for the entries in Z, zij = z ij E(z ij ) = z ij p(z ij ) = z ij p for i = 1,, n and j = 1,, k z 1. As the last dummy variable is dropped out as reference group, the number of instruments taken into account is reduced from k z to k z 1. We specify the conditional covariance structure of the errors in reduced form as conditional variances and covariance between the errors as a linear combination of instruments, which is given by Σ v (z i ) = σ2 1 (z i) σ 12 (z i ) = z i γ σ 12 (z i ) σ2 2(z i) z i α z i α z i δ where z i is a k z -dimensional vector of instruments for the i th observation and γ, δ and α are uniformly distributed over the intervals (0, 1), (0, 1) and ( 2, 2) except for the first entry in each vector such that σ 12 (z i ) < σ 1 (z i )σ 2 (z i ) which makes sure the covariance matrix is positive definite. To ensure the variance of v i to be positive, we use the original z i considering a constant and dropping out the last dummy variable as reference group. 19

When β = 0 like in Andrews (2014), the structural equation errors u i and reduced form errors v i are equal to v 1,i and v 2,i, respectively, whereas when β 0, u i and v i have a relation with v 1,i and v 2,i such that then v 1,i = 1 β u i 0 1 v i v 2,i v i u i = 1 β v 1,i. 0 1 v 2,i Suppose that the errors in the model are generated satisfying the conditional covariance structure of v 1,i and v 2,i above, which is given by, when β = 0, u i = Σ v (z i ) 1/2 ɛ 1,i and when β 0, where with I 2, an identity matrix of order 2. v i v i ɛ 2,i u i = 1 β Σ v (z i ) 1/2 ɛ 1,i 0 1 ɛ 1,i N(0, I 2 ) ɛ 2,i ɛ 2,i From the conditional covariance structure above, we have when β = 0 E(v i z i ) = z iδ E(u i v i z i ) = z iα and when β 0 E(v i z i ) = z iδ E(u i v i z i ) = z iα βz iδ. Using the assumption of z i which consists of 1 and a collection of k z 1 dummy variables which are mutually exclusive such that z ij = 1 or 0 and kz 1 z ij = 1, we have the unconditional variances and covariance of the errors such that, when β = 0, σv 2 = E(vi 2 ) = E Z E(vi 2 z i ) = E Z (z iδ) = δ 1 + p σ uv = E(u i v i ) = E Z E(u i v i z i ) = E Z (z iα) = α 1 + p 20 δ j α j

and when β 0, σv 2 = E(vi 2 ) = E Z E(vi 2 z i ) = E Z (z iδ) = δ 1 + p σ uv = E(u i v i ) = E Z E(u i v i z i ) = E Z (z iα βz iδ) = (α 1 βδ 1 ) + p (α j βδ j ) δ j Using the same assumptions on instruments, we have Q ZZ, which is given by Q ZZ = E(z i z i ) = p(i kz 1 pj kz 1J k z 1 ) where I kz 1 is an identity matrix of order k z 1 and J kz 1 is a (k z 1)-dimensional vector of ones. Denote Ω c z,v and Ω c z,uv be Ω z,v and Ω z,uv for this specific type of heteroskedasticity. Using the same assumption on instruments, Ω c z,v and Ω c z,uv are give by, when β = 0, Ω c,β=0 z,v = E(vi 2 zi zi ) = E Z E(vi 2 z i )zi zi = E Z (z iδ)z i z i = δ 1 Q ZZ + pd kz 1 p 2 (δ kz 1J k + J z 1 k z 1δ k ) + z 1 p3 (δ k J z 1 k z 1)(J kz 1J k ) z 1 Ω c,β=0 z,uv = E(u i v i zi zi ) = E Z E(u i v i z i )zi zi = E Z (z iα)z i zi = α 1 Q ZZ + pa kz 1 p 2 (α kz 1J k z 1 + J k z 1α k z 1 ) + p3 (α k z 1 J k z 1)(J kz 1J k z 1 )(14) and when β 0, Ω c,β 0 z,v = E(vi 2 zi zi ) = E Z E(vi 2 z i )zi zi = E Z Ω c,β 0 z,uv = E(u i v i zi zi ) = E Z E(u i v i z i )zi zi = E Z (z iα)z i z βe Z i (z iδ)z i z i (z iδ)z i zi = E Z = Ω c,β=0 z,uv = Ω c,β=0 z,v (z iα βz iδ)z i z i βω c,β=0 z,v (15) where δ 1 and α 1 are the first element in δ and α, respectively, D kz 1 and A kz 1 are a diagonal matrix with the j th element δ j and α j for j = 2,, k z, respectively, δ kz 1 and α kz 1 are a vector of order k z 1 with the elements in δ and α from the second onwards, respectively, I kz 1 is an identity matrix of order k z 1 and J kz 1 is a k z 1-dimensional vector of ones. See the Appendix for the detailed derivations above. Based on the derivations above, the relative bias under this kind of heteroskedasticity is given by B n = E σv(λ 2 + zv) Σ c z,uvzv σ uv (λ + zv) Σ c z,v(λ + zv). (16) 21

where when β = 0 Σ c,β=0 z,v = Q 1/2 ZZ Ω c,β=0 z,v Q 1/2 ZZ ; Σ c,β=0 z,uv = Q 1/2 ZZ Ω c,β=0 z,uv Q 1/2 ZZ ; λ = (Σ c,β=0 z,v ) 1/2 Q 1/2 ZZ c ; z v = (Ω c,β=0 z,v ) 1/2 Q 1/2 ZZ ψzv and when β 0 Σ c,β 0 z,v = Q 1/2 ZZ Ω c,β 0 z,v Q 1/2 ZZ ; Σ c,β 0 z,uv = Q 1/2 ZZ Ω c,β 0 z,uv Q 1/2 ZZ = Σ c,β=0 z,uv βσ c,β=0 z,v ; λ = (Σ c,β 0 z,v ) 1/2 Q 1/2 ZZ c ; z v = (Ω c,β 0 z,v ) 1/2 Q 1/2 ZZ ψzv with the same expressions for Ω c,β=0 z,v, Ω c,β=0 z,uv, Ω c,β 0 z,v and Ω c,β 0 z,uv as above. The relative bias under this type of heteroskedasticity depends on the coefficients α and δ of a linear combination of instruments. Unlike the cases discussed before, it is not clear about the magnitude of it to that from Stock and Yogo given the strength of instruments. However, exploring a variety of combinations of the coefficients, it tends to be larger than that under homoskedasticity, and increases as the sum of the entries in α is similar to that in δ in absolute terms and they have different sign when the average of all the elements in δ is controlled. Table 1: Ratio of k z α j to k z δ j Relative bias k z - 10% 10% - 20% 20% - Average (a) β = 0 4-0.2479-0.5909 0.2187 8-0.1060-1.1115 0.1904 12-0.1448-1.0162 0.1852 (b) β 0 4 1.2290 0.5891-1.3607 0.1856 8-0.2625-0.1798 12 - -0.2972-0.1797 Notes: The first entries in α and δ are ruled out in calculation of the ratio; Relative bias given the strength of instruments leading that from Stock and Yogo to 10% with sample size 10,000; k z is the number of instruments; Each cell is the average of the ratio of k z α j to k z δ j for a given range of the relative bias and the cells below Average is the average of the relative bias given k z across the combinations of the coefficients; - in the cells means that there are no combinations of α and δ to leads to a given range of the relative bias; α and δ are drawn from an uniform distribution over the interval ( 2, 2) and (0, 1), respectively and for δ, the average of all elements in δ is controlled as 0.5. 22

4.4 Simulation results We report simulation results for the model (1) under heteroskedasticity specified in the previous sections. We consider k z = 4, 8 and 12 and u i and v i are generated according to the error covariance structure discussed above. We set π = c/ n where c is a k z 1 vector of constants and it is chosen such that the large sample relative bias B n as defined is equal to 10%. For the cases where heteroskedasticity is imposed on v i, the instruments in z i are independently drawn from the standard multivariate normal distribution and we assume all the coefficients applying to the instruments are the same, γ j = γ s for all j = 1,..., k z. Also, γ is set such that k z γ s and exp(k z γ s /2) is invariant to k z to control the bias of the OLS estimator being not too small. Furthermore, we set β = 1. When a categorical instrument is considered, the coefficients γ, α and δ except for the first entry in each vector are drawn from a uniform distribution over the interval (0, 1) for γ and δ, and ( 2, 2) for α such that α 1 + α j (γ 1 + γ j )(δ 1 + δ j ) for j = 2,..., k z to ensure the conditional covariance structure of the errors to be positive definite. γ 1, α 1 and δ 1 are set to be 0.5, 0.2 and 0.5, respectively, which apply to the constant terms in the original z i when generating heteroskedasticity. In addition, the average of all elements in δ is controlled as 0.5. The results are shown in Table 2, 3, 4 and 5 for a sample size of 100, 000 for 10, 000 Monte Carlo replications. When only heteroskedasticity on v i is introduced, the critical values for the weak instrument test depend on the covariance ρ in a way that they move in the same direction as ρ regardless of the definitions for weak instruments of the relative bias and the Wald size, which is in line with the analytical results above. In addition, they are smaller than those from Stock and Yogo (2005) regardless of ρ, meaning that when the critical values from Stock and Yogo are used under this specific kinds of heteroskedastic errors, the weak instrument test results are conservative in the sense that the null of weak instruments is rejected too infrequently for both the definitions of weak instruments of the relative bias and the Wald size. With regard to the heteroskedasticity robust first-stage F-statistic and the critical values, they are more sensitive to the coefficients γ for the exponential function case than for the squared instruments case because the feature of an exponential function which magnifies the effect of heteroskedasticity more than a linear function. For the case where a categorical instrument is considered, the critical values are larger than those of Stock and Yogo for most combinations 23

of the coefficients, which means that when the weak instruments test is implemented using the Stock and Yogo critical values under this specific type of heteroskedasticity, the test results can be misleading about the strength of instruments considered in a way that the null of weak instruments is rejected too frequently, meaning that there is possibility that the strength of instruments is overestimated under this kind of heteroskedasticity. Judging on all the cases discussed above, the practical approach for the weak instruments test via a comparison of the robust F-statistic and the Stock and Yogo critical values can be at risk of misleading the strength of instruments for some types of heteroskedasticity like the categorical instrument case, indicating that heteroskedasticity can affect a test for weak instruments when following the test procedure of Stock and Yogo with just replacing the standard F-statistic with the robust F- statistic as a test statistic. The observed relative bias, Wald size and rejection frequencies using the newly calculated critical values are around 10%, and 5% at the nominal level of 5%, respectively, as anticipated. With regard to the Olea and Pflueger (2013) weak instruments test, the test results are quite conservative in the sense that the null of weak instruments is rejected too infrequently, compared to the weak instruments test following the Stock and Yogo testing procedure in most of heteroskedasticity cases discussed above. However, there are some test results which are not conservative in the categorical instrument case, meaning that there is possibility that the strength of instruments can be overestimated in this specific kind of heteroskedasticity. 24

Table 2: Estimation and critical values under heteroskedasticity (squared instruments) Estimator Critical values Rej. freq. ρ γ ˆβ2SLS ˆβOLS Rel. bias F r F eff H sq SY H OP H sq SY H OP (a) k z = 4 0.9 0.2 1.0542 1.5589 0.0969 4.9491 4.9225 8.8069 10.27 10.6709 0.0479 0.0171 0.0128 0.5 1.0344 1.3202 0.1073 4.7342 4.6883 8.4274 10.27 10.7094 0.0535 0.0131 0.0086 0.8 1.0242 1.2244 0.1078 4.5622 4.5851 8.2335 10.27 10.7289 0.0488 0.0092 0.0072 0.5 0.2 1.0470 1.4761 0.0988 4.6437 4.5936 8.3134 10.27 10.7193 0.0546 0.0094 0.0072 0.5 1.0227 1.2222 0.1021 4.4695 4.4253 8.1060 10.27 10.7427 0.0486 0.0101 0.0054 0.8 1.0136 1.1449 0.0940 4.4047 4.3694 8.0126 10.27 10.7475 0.0467 0.0072 0.0048 25 (b) k z = 8 0.9 0.1000 1.0554 1.5587 0.0991 7.1461 7.1317 10.3189 11.39 12.3387 0.0518 0.0161 0.0062 0.2500 1.0314 1.3201 0.0982 6.8167 6.8302 9.9575 11.39 12.3636 0.0502 0.0104 0.0036 0.4000 1.0232 1.2243 0.1032 6.7169 6.7568 9.8439 11.39 12.3728 0.0471 0.0104 0.0022 0.5 0.1000 1.0472 1.4759 0.0992 6.7983 6.8221 9.9195 11.39 12.3672 0.0524 0.0106 0.0038 0.2500 1.0224 1.2221 0.1009 6.6124 6.5954 9.6877 11.39 12.3809 0.0497 0.0079 0.0024 0.4000 1.0138 1.1449 0.0950 6.5728 6.5871 9.6250 11.39 12.3860 0.0527 0.0098 0.0018 (b) k z = 12 0.9 0.0667 1.0553 1.5584 0.0990 8.0130 8.0319 10.7518 11.52 12.6028 0.0486 0.0194 0.0044 0.1667 1.0324 1.3199 0.1014 7.7634 7.7507 10.4512 11.52 12.6183 0.0500 0.0126 0.0020 0.2667 1.0219 1.2242 0.0977 7.6673 7.6570 10.3517 11.52 12.6258 0.0525 0.0133 0.0030 (Continues)

Table 2 Continued Estimator Critical values Rej. freq. ρ γ ˆβ2SLS ˆβOLS Rel. bias F r F eff H sq SY H OP H sq SY H OP 0.5 0.0667 1.0468 1.4756 0.0984 7.7268 7.7511 10.3954 11.52 12.6229 0.0494 0.0121 0.0044 0.1667 1.0213 1.2220 0.0962 7.5765 7.5753 10.2298 11.52 12.6340 0.0506 0.0099 0.0034 0.2667 1.0150 1.1448 0.1034 7.5658 7.5265 10.2027 11.52 12.6347 0.0494 0.0097 0.0014 Notes: Sample size 100,000; 10,000 Monte Carlo replications; β = 1; ρ is the covariance between the errors; γ is a coefficient of a linear combination of squared instruments; F r is a heteroskedasticity robust first-stage F-statistic; F eff is a effective F-statistic from Olea and Pflueger (2013); Rel. bias is the relative bias of the 2SLS estimator, relative to the OLS estimator; H sq critical values are used for the robust F-test at a 5% nominal level for a 10% relative bias, which yields H sq rejection frequencies (rej. freq.); SY rej. freq. uses the 5% critical values from Stock and Yogo (2005) for the robust F-test for a 10% relative bias. H OP critical values are used for weak instrument test of Olea and Pflueger at a 5% nominal level for a 10% Nagar bias compared to the benchmark bias, which yields H OP rejection frequencies (rej. freq.) 26

Table 3: Estimation and critical values under heteroskedasticity (exponential function) Estimator Critical values Rej. freq. ρ γ ˆβ2SLS ˆβOLS Rel. bias F r F eff H exp SY H OP H exp SY H OP (a) k z = 4 0.9 0.2 1.0482 1.4753 0.1014 5.6164 5.9469 9.7204 10.27 10.7674 0.0487 0.0317 0.0364 0.5 1.0370 1.3660 0.1011 4.2692 5.6953 7.8851 10.27 12.3927 0.0501 0.0068 0.0212 0.8 1.0206 1.2043 0.1007 3.1646 5.4929 6.2166 10.27 15.1623 0.0491 0.0007 0.0092 0.5 0.2 1.0395 1.3750 0.1054 5.4984 5.9448 9.5694 10.27 10.8808 0.0489 0.0289 0.0378 0.5 1.0276 1.2633 0.1049 4.0565 5.6340 7.5081 10.27 12.8577 0.0521 0.0047 0.0170 0.8 1.0138 1.1300 0.1059 3.0918 5.4373 6.0294 10.27 15.5241 0.0496 0.0011 0.0094 27 (b) k z = 8 0.9 0.1414 1.0482 1.4751 0.1014 7.4094 7.9146 10.6835 11.39 12.3800 0.0469 0.0240 0.0188 0.3536 1.0359 1.3658 0.0983 5.3352 7.6838 8.0812 11.39 13.3405 0.0489 0.0003 0.0172 0.5657 1.0199 1.2042 0.0974 3.5351 7.1068 5.6946 11.39 15.1619 0.0554 0.0000 0.0110 0.5 0.1414 1.0378 1.3748 0.1010 7.2637 7.9406 10.4732 11.39 12.4472 0.0534 0.0232 0.0256 0.3536 1.0274 1.2632 0.1042 4.9345 7.5738 7.5362 11.39 13.6466 0.0492 0.0001 0.0166 0.5657 1.0123 1.1299 0.0950 3.3618 7.1022 5.4492 11.39 15.4541 0.0551 0.0000 0.0072 (b) k z = 12 0.9 0.1155 1.0480 1.4749 0.1010 8.0755 8.7238 10.7975 11.52 12.6302 0.0521 0.0225 0.0196 0.2887 1.0372 1.3657 0.1016 5.6961 8.3566 7.9473 11.52 13.3917 0.0544 0.0000 0.0118 0.4619 1.0202 1.2041 0.0992 3.6681 7.9827 5.4364 11.52 14.8893 0.0550 0.0000 0.0106 (Continues)

Table 3 Continued Estimator Critical values Rej. freq. ρ γ ˆβ2SLS ˆβOLS Rel. bias F r F eff H exp SY H OP H exp SY H OP 0.5 0.1155 1.0391 1.3746 0.1044 7.8316 8.6253 10.5112 11.52 12.6809 0.0538 0.0182 0.0190 0.2887 1.0252 1.2631 0.0958 5.1888 8.2849 7.3731 11.52 13.6400 0.0499 0.0000 0.0124 0.4619 1.0122 1.1298 0.0938 3.4426 7.8547 5.1638 11.52 15.1694 0.0538 0.0000 0.0114 Notes: Sample size 100,000; 10,000 Monte Carlo replications; β = 1; ρ is the covariance between the errors; γ is a coefficient of a linear combination of squared instruments; F r is a heteroskedasticity robust first-stage F-statistic; F eff is a effective F-statistic from Olea and Pflueger (2013); Rel. bias is the relative bias of the 2SLS estimator, relative to the OLS estimator; H sq critical values are used for the robust F-test at a 5% nominal level for a 10% relative bias, which yields H sq rejection frequencies (rej. freq.); SY rej. freq. uses the 5% critical values from Stock and Yogo (2005) for the robust F-test for a 10% relative bias. H OP critical values are used for weak instrument test of Olea and Pflueger at a 5% nominal level for a 10% Nagar bias compared to the benchmark bias, which yields H OP rejection frequencies (rej. freq.) 28

Table 4: Estimation and critical values under heteroskedasticity (categorical instrument) Estimator Critical values Rej. freq. γ, α & δ ˆβ2SLS ˆβOLS Rel. bias F r F eff H ci SY H OP H sq SY H OP (a) k z = 4, β = 0 case 1-0.0208-0.2025 0.1025 18.1095 13.1733 26.6239 9.08 13.4897 0.0500 0.9836 0.4360 2 0.0584 0.5376 0.1088 8.8947 6.4065 14.8819 9.08 13.3432 0.0510 0.4392 0.0090 29 (b) k z = 4, β = 1 case 1 0.8352-0.5915 0.1035 10.1325 7.2424 16.4822 9.08 13.9722 0.0519 0.5845 0.0210 2 0.9506 0.5170 0.1024 7.2082 5.3948 12.6357 9.08 12.8155 0.0468 0.2451 0.0060 3 0.9792 0.7946 0.1013 3.7227 2.9478 7.5873 9.08 12.1295 0.0505 0.0166 0.0000 (c) k z = 8, β = 0 case 1-0.0206-0.2204 0.0934 19.9564 12.0323 25.5864 11.29 14.3145 0.0518 0.9983 0.1330 2 0.0477 0.4864 0.0980 13.9288 8.7365 18.7238 11.29 14.1142 0.0491 0.8306 0.0060 (d) k z = 8, β = 1 case 1 0.8363-0.6416 0.0997 14.9824 9.2875 19.9236 11.29 12.1295 0.0488 0.9083 0.0080 (e) k z = 12, β = 0 case 1-0.0262-0.2602 0.0984 19.0150 11.0863 23.4652 11.51 14.0026 0.0490 0.9987 0.0350 2 0.0487 0.4843 0.1006 15.6648 9.3058 19.6703 11.51 13.7852 0.0579 0.9670 0.0040 (Continues)