Robust Standard Errors to spatial and time dependence when. neither N nor T are very large. Lucciano Villacorta Gonzales. February 15, 2014.

Size: px
Start display at page:

Download "Robust Standard Errors to spatial and time dependence when. neither N nor T are very large. Lucciano Villacorta Gonzales. February 15, 2014."

Transcription

1 Robust Standard Errors to spatial and time dependence when neither N nor T are very large Lucciano Villacorta Gonzales February 15, 2014 Abstract In this article, I study two dierent approaches (two-way cluster and spatial - time autoregressive models) to consistently estimating the variance of the OLS estimator of panel models that are robust ( to spatial and time dependence The speed of convergence of the two-way cluster estimator N, ) is min T More important, the variance of the cluster estimator is also aected by the two forms of dependence As a consequence, when T and N are not large enough, inference based on the two way cluster estimator may be misleading I also show that the two way cluster estimator can be expressed in terms of a fully exible spatial - time autoregressive model As such, if we have a prior idea of the dependence structure we can gain a lot in terms of precision and reduce the probability of type I error Finally, I study the implications of both type of dependence on a state-year panel data of wage inequality and minimum wage in US When both types of error dependence are considered in the variance estimator, the marginal eect of the minimum wage over wage inequality is not longer signicant Keywords: Panel, spatial dependence, time dependence, two way cluster, spatial and time autoregressive model JEL codes: C230 *I am deeply thankful to Manuel Arellano and Stephane Bonhomme for their guidance and extensive discussions, which provide me a better understanding of econometrics Errors and omissions are exclusively my own responsibility Support from the European Research Council/ ERC grant Estimation of non linear panel models with unobserved heterogeneity (grant agreement n ) is gratefully acknowledged CEMFI Casado del Alisal 5, E-28014, Madrid, Spain luccianovillacorta@gmailcom, luccianovillacorta@cemfies 1

2 1 Introduction Panel data models have been increasingly used in applied econometrics They oer the possibility to control for some kind of endogeneity without the need of exogenous instruments and allows to consider richer models, including dynamics, time eects and factor structures The classical literature on panel data has allowed time series dependence, but has generally assumed that the observations are cross-sectionally independent However, there are several settings where cross sectional independence is not a suitable assumption Examples of this are models where the cross-sectional units are nonrandom sample of states, countries or industries, as these units are likely to be connected through neighboring eects, economic trade or linkage in productions, etc 1 In setups like this, where the observable and the unobservable variables of the model present time and spatial dependence, the variance of the OLS estimator of the model parameters will be a function of these dependence 2 Although it is well understood that not taking into account the error dependence will invalid statistical inference 3, computing a variance estimator that is robust to both type of dependence is not common in applied econometrics In particular Bertrand et al (2004), Hoeschle (2007) and Drummond et al (2009) provide evidence of a wide range of empirical applications where the error term of a model could exhibit one or more forms of dependence which are usually not taken into account or corrected in the right way Nevertheless, corrections for the two types of dependence are now available in the literature, but suer from a number of limitations In this article, I study dierent approaches to consistently estimating the variance of the OLS estimator that are robust to spatial and time dependence in panels in which neither the sample size of the cross section N nor the sample size of the time series T are large enough Popular examples are state-year panels, industry-year panels or country - year panels, where 1 Example of these are Barrios et al (2011), Watson et al (2011), among others 2 As well as the variances of other estimators such as IV, GMM, etc 3 In most of the situations the correlation is positive, therefore, ignoring the correlation leads to an underestimation of standard errors increasing dramatically the probability of type I error Cameron et al (2011) show examples where standard errors increased more than threefold when both types of error dependence were taken into account 2

3 the number of states, industries and countries are limited and the availability of time series data is not large enough 4 By far the most common approach put forward in the literature to estimate a variances that is robust to both type of dependence, is the estimator proposed by Cameron et al (2011) and Thompson (2011) This estimator is totally non parametric and extends the one-way cluster approach of Arellano (1987) to a multiway clustering that takes into account error dependence in more than one dimension In the panel data setup with time and spatial dependence it is possible to apply a two way clustering procedure, decomposing the total variance in two terms: (i) one that only contains the spatial correlation, which is estimated by grouping the data in time clusters and (ii) one that only contains the time correlation, which is estimated by grouping the data in spatial clusters The main advantage of this approach is that it does not require any specication for the variance covariance matrix, allowing for general forms of dependence Nevertheless, in order to have an accurate approximation, this variance estimator needs a suciently large number of both types of clusters, since its asymptotic properties are fullled when the number of clusters goes to innity Moreover, the rates at which each cluster estimator -spatial cluster and time cluster- converges to its asymptotic normal distribution are N and T respectively, instead of the OLS rate of NT As a consequence, test based on the the two way clustering estimator perform poorly when N and T are not large enough In settings where N and T are not large enough, the gain of imposing some structure in the error dependence could be important For instance, if we limit the dependence with some mixing assumption we can consider alternative estimators with better small sample properties than the two way cluster estimator For example, it is possible to consider a double Heteroskedasticity and Autocorrelation Consistent variance estimator (HAC) With a mixing assumption in the time series, we can use the HAC estimator of Newey West to take into account the time dependence, analogously with a mixing assumption in the cross section we can estimate a spatial HAC version as in Conley (1999) or Kelejian and Prucha (2003) Nevertheless, this semi-parametric approach, still needs a suciently large sample in order to performs accurately 4 In this kind of setups, where the number of observation is limited the variance is a rst order concern 3

4 Another possibility, that could be more convenient in small samples, is to impose more structure modeling the dependence in a parametric way This approach seems to be less common in empirical applications, since specifying a spatial model is not straightforward (they need a number of specics as a weighting matrix, etc), also, if the model is misspecied, the properties of these procedure are in general unknown In general, it is well known that there is a trade-o between poorly small sample properties but exibility of the non parametric approach and better small sample behavior but sensitivity to model specication of the parametric one Nevertheless, in some applications there exists some prior knowledge about the dependence, that might be useful in order to compute more precises estimates of the variance The purpose of this paper is threefold First, I study the properties of the two way clustering estimator, both analytically and numerically The speed of convergence of the two-way cluster estimator ( N, ) is min T More important, the variance of the cluster estimator (the variance of the variance estimator) is also aected by the two forms of dependence As a consequence, when T and N are not large enough, inference based on the two way cluster estimator may be misleading Second, I make a connection between the cluster estimator and the spatial autoregressive models (SAR) The two way clustering approach can be expressed in terms of a fully exible spatial and time autoregressive model Thus, estimating the OLS variance with a parametric model for the error term can be seen as a particular case of the cluster estimator As such, if we have a prior of the dependence structure we can gain a lot in terms of precision and reduce the probability of type I error In addition, I propose a non linear least square estimation of the spatial and time autoregressive model that is consistent and computationally ecient under the presence of xed and time eects in panels which neither the sample size of the cross section N nor the sample size of the time series T are large enough Third, I study the implications of error dependence in a US state-year panel data of wage inequality and minimum wage with sample size of N = 50 and T = 30 Also, I use this empirical application to study the small sample behavior of the non parametric and the parametric approaches Using Monte Carlo simulations, I found a probability of type I error of around 70% instead of the nominal size of 4

5 5% when error dependence is ignored The two way cluster approach has 16% of type I error when the nominal size of the test is 5% On the other hand, using standard errors which incorporate the dependence in a parametric way, leads to attain the nominal size of the test The standard errors of the OLS estimator of the marginal eect of eective minimum wage over inequality, increases by 200% when I use a one way-clustering (either by time or state) and by 300% when I use a two way-clustering approach, rather than without cluster Despite this fact, the estimated marginal eect still remains signicant at the 5% level However, when I use the parametric variance to control for the dependence, the marginal eect of the minimum wage over US state inequality loses the signicance Finally the standard error of the estimated average marginal eect by 2SLS, increases by 100% and by 200% when I use the two way cluster approach and the parametric approach, respectively Using either the two way clustering or the parametric variance to account for the dependence, the 2SLS estimation of the marginal eect of the minimum wage over US state inequality is no longer signicant 5

6 2 A panel model with spatial and time dependence Consider a linear panel regression model dened by: y it = x it β + α i + δ t + u it (21) where i = 1,, N indexes individuals, t = 1,, T indexes time, x itis a vector of observables covariates, α i is an individual xed eect that is constant over time, δ t is a time eect that is common across individuals and u it is an unobservable error component For simplicity, as in Hansen (2007) I work with a transformation of the variables in order to remove the nuisance parameters α i and δ t y it = x itβ + u it (22) where the variables in (22) are the variables in (21) deviated with respect to their own individual sample mean and time sample mean A 1: E (u it x it ) = 0 The OLS estimator of β from equation (22) is dened as ˆβ = ( N i T t x itx it) 1 ( N i T t x ity it ) Under assumption 1 and regularity conditions, ( ) d d NT ˆβ β N (0 V ), where V is the asymptotic ( N variance of the OLS estimator and is equal to Q 1 W Q 1 1, where T Q = lim dnt NT i t x itx it) and [ ( 1 W = lim dnt NT V AR N )] T i t x itu it Until this moment I have not established the correlation structure of u it In fact, the speed of convergence of the OLS estimator d NT and the form of W and V will depends on the correlation structure in u it and x it In the more general case of dependence: ( W = 1 N NT V AR i ) T x it u it t = 1 NT ( N T V AR i=1 t ) N 1 x it u it + 2 N i=1 j=i+1 ( T COV x it u it t T t x jt u jt ) 6

7 A 2 : E (u it u jt ) 0, E (u it u is ) 0, E (u it u js ) = 0 Assumption 2 implies spatial dependence across individuals in the same time and time dependence for each individual, but rules out correlation between dierent individuals in dierent moments in time Under assumption 2 the expression for W will be the following: ( W = 1 N T ) N 1 V AR x it u it + 2 NT i=1 By linearity of the covariance t N i=1 j=i+1 COV ( T t x it u it x jt u jt ) ( W = 1 N T ) V AR x it u it + NT i=1 t T 2 t N 1 N i=1 j=i+1 COV (x it u it x jt u jt ) If we add and subtract N i T t V AR (x itu it ) we can express W as: [ W = 1 N ( T ) V AR x it u it + NT i=1 t ( T N ) V AR x it u it t t N i ] T V AR (x it u it ) t Let dene to dierent ways of group the data: U i = u i1 u i2 and U t = u t1 u t2 u it u tn W = 1 NT N i=1 E (X iu i U ix i ) + 1 NT T t E (X tu t U tx t ) 1 NT N i T V AR (x it u it ) I have separated W in three parts: a rst one that only take into account the time dependence, a second one that take into account only the spatial dependence and a third part that only takes into account the variance of each observation t W = 1 N N E (X iω ix i ) + 1 T i=1 T t E (X tω tx t ) 1 NT N i T V AR (x it u it ) t 7

8 Where Ω i = 1 T E [U iu i X i] and E (X i Ω i X i) = 1 T T t=1 E ( x it x it it) u T 1 T T t=1 s=t+1 E (u itu is x it x is ) and Ω t = 1 N E [U tu t X t ] and E (X tω tx t ) = 1 N N i=1 E ( x it x it u2 it) +2 1 N N 1 N i=1 j=i+1 E ( u it u is x it x jt) Notice that when N, if we do not have weak dependence in the cross section, W will not be nite, the term lim N 2 1 N 1 N N i=1 j=i+1 E ( ) u itu isx itx jt and we would not be able to dene an asymptotic distribution for ˆβ unless we add some assumptions about the spatial dependence or we change the formula for the asymptotic variance Therefore, if we do not assume any mixing condition about the dependence, we have to divide W by N in order to obtain a nite variance covariance matrix The case when T and there is no mixing in the time series is exactly the analogues, the term T 1 T t=1 s=t+1 E (uituisxitx is) and we need to divide W by T lim T 2 1 T Before start analyzing estimators of W, which is the main purpose of this section, I will discuss the properties of the OLS estimator of β under dierent assumptions about the error dependence and dierent situations about the sample size of the cross section and the time series Remark 1 : When {N, T } and there is mixing in the spatial and the time dependence, the speed of convergence of the OLS estimator to their asymptotic distribution is d NT = NT Remark 2 : When {N, T } and there is mixing in the cross section but not in the time series, then the OLS estimator is N consistent rather than NT consistent Intuitively, this slower rate of convergence is because the time series data are less informative since we are allowing for general form of time dependence and we learning only from the cross section Remark 3 : When {N, T } and there is mixing in the time series but not in the cross section, then the OLS estimator is T consistent rather than NT consistent Intuitively, this slower rate of convergence is because the cross sectional data are less informative since we are allowing for general form of spatial dependence and we learning only from the time series In the following subsections, I will analyze the properties of dierent estimators of W and V, when we have both spatial and time dependence 8

9 3 Two way cluster estimator: The purpose of this section is study the properties of the two way cluster estimator of V as a robust way of taking in to account the spatial and time dependence in the error term and give some insight about their assumptions For that purpose I extend the results in Hansen (2007), who analyzed the properties of the one way cluster estimator in a panel with only time dependence in the errors, to the two way cluster estimator in a framework with spatial and time dependence In order to obtain a consistent estimator of V, we need to get a consistent estimator of W Once we get Ŵ we are done ( ( 1 N ˆV = NT i 1 ( ( T x it x it)) 1 N Ŵ NT t i T x it x it t )) 1 The idea is to get a consistent estimator of W = 1 N NT i=1 E (X i U iu i X i)+ 1 T NT t E (X tu t U tx t ) T t E ( ) x it x it u2 it Therefore, we need to consistently estimate each of the three parts of W 1 NT N i The two way clustering estimator proposed by Cameron et al (2011) extend the one way cluster estimator of Arellano (1987) to a setting of correlation in two dimensions In fact, Cameron et al (2011), proposed to estimate each of the two rst parts of W by a one way cluster estimator The rst part of W will be estimated by grouping the data by individual clusters and the second part will be estimated by grouping the date by time clusters Notice that the third part of W is the total mean of each of the variances and can be consistently estimated using the White formula : 1 NT N i where û it are the OLS residuals of (22) T t ( ) xit x itû2 Let me discuss a little bit about the insights of this estimator The rst part of W contains the time correlation of each individual: 1 N N i=1 E (X i Ω i X i), where Ω i is a T xt matrix that contains all the time dependence of individual i: Ω i = 1 T E [U iu i X i] And in the more general case of time dependence each of the Ω i contain T (T + 1)/2 dierent elements Nevertheless is important to remark that we do not need to consistently estimate each of the Ω i, 9

10 we only need to consistently estimate an average of this individual clusters: 1 N N i=1 E (X i U iu i X i) This is the same problem as when we work with heteroskedasticty errors Therefore we can extend the ( ) 1 robust variance estimator proposed by White to the data grouped by individual N NT i=1 X iûiû i X i In fact, this is the one way cluster estimator for panel data proposed by Arellano (1987) and studied by Hansen (2007) as: In order to understand the conditions for consistency of this estimator we can express X i U iu i X i X iu i U ix i = E (X iu i U ix i ) + ɛ i where E [ɛ i ] = 0 Then 1 N N X iu i U ix i = 1 N i=1 N E (X iu i U ix i ) + 1 N i=1 N i=1 ɛ i Therefore 1 N N i=1 X i U iu i X i will be a consistent estimator of 1 N N i=1 E (X i U iu i X i) as long as 1 N N i=1 ɛ p i E [ɛ i ] This will happen only when N Given assumption 1, we can replace the unobserved vector U i by the consistent estimator Ûi in the one way cluster formula ( ) 1 N Remark 4: The estimator NT i=1 X iûiû i X i is a function of the weighted average of all the individual-cluster variances, therefore, in order to derive the asymptotic we need the number of cluster N to go to innity Remark 5: When {N, T } the speed of convergence of 1 NT N i=1 ( ) X iûiû i X i to their asymptotic distribution is N (see Theorem 2 and 3 of Hansen (2007)) The second part of W contains the spatial correlation of the error in each moment in time: 1 T T t=1 E (X tω tx t ), where Ω t is a NxN matrix that contains all the spatial dependence in time t 10

11 As in the previous case we do not need a consistent estimator of each Ω t, we need a consistent estimator of 1 T T t=1 E (X tu t U tx t ) Therefore we can group the data by time clusters and compute ) 1 the following expression: T T t (X tûtû tx t The last expression will be a consistent estimator of 1 T T t E (X tu t U tx t ) as long as 1 T T t=1 ɛ t E [ɛ t ]This will happen only when T ) 1 T Remark 6: The estimator NT t (X tûtû tx t is a function of the weighted average of all the time-cluster variances, therefore in order to derive the asymptotic we need the number of cluster T to go to innity Remark 7: When {N, T }, the speed of convergence of 1 NT T t ) (X tûtû tx t to their asymptotic distribution is T p The two way cluster estimator is the following: Ŵ 2W CLUST ER = 1 NT N i=1 (X iûiû ix i ) + 1 NT T t (X tûtû tx t ) 1 NT N i T ( xit x itû 2 it) t where Ûi is a T x1 vector that groups the T residuals for individual i and Ût is a Nx1 vector that groups the N residuals in timet Theorem: Under weak dependence and suitable regularity in the time series and the cross section If A1 and A2 are satised: 1 The speed of convergence of Ŵ 2W CLUST ER is min (N, T ) 2 For Ω t = X tu t U tx t : T [ vec ( Ŵ Clsuter,t 1 NT )] T E (Ω t ) t=1 d 11

12 N ( 0, lim N 1 N 2 T ( T i=1 T 1 V ar (Ω t ) + 2 t=1 s=1 )) T Cov (Ω t, Ω s ) 3 For Ω i = X iu i U ix i : N N 0, lim N 1 NT 2 [ N i=1 vec ( Ŵ Cluster,i 1 NT N 1 V ar (Ω i ) + 2 i=1 j=1 )] N E (Ω i ) i=1 d N Cov (Ω i, Ω j ) Remark 8: The two way cluster estimator needs that both N and T goes to innity, because the rst and the second part of Ŵ 2W CLUST ER are consistent when N goes to innity and when T goes to innity respectively Remark 9a :ŴClsuter,t = 1 NT T t=1 E (Ω t) is not well estimated when the time series presents dependence and T is not very large Remark 9b :ŴCluster,i = 1 NT N i=1 E (Ω i)is not well estimated when the cross section presents dependence and N is not large The speed of convergence of the two-way cluster estimator is min ( N, T ) More important, the variance of the cluster estimator (the variance of the variance estimator) is also aected by the two forms of dependence As a consequence, when T and N are not large enough, inference based on the two way cluster estimator may be misleading It is true that the two way clustering approach is the ideal way of estimating the variance, because do not impose any restriction about the form of the dependence Nevertheless this freedom comes with the requirement of a large number of observations in both the cross section and time series In settings where N and T are not large enough, as in the state-year panels or country - year panels, the gain of imposing some structure in the error dependence could be important For example, a mixing assumption give us the possibility of consider alternative estimators of the variance as the double Heteroskedasticity and Autocorrelation Consistent variance estimator (HAC) With a mixing assumption in the time series, we can use the HAC estimator of Newey West for taking into account the time dependence, and with a mixing assumption in the cross section we can estimate a spatial HAC 12

13 version as in Conley (1999) The double HAC estimator should have better small sample properties than the two way cluster estimator In a recent paper Bester et al (2011) proposed a test based on the cluster estimator when the number of clusters is small Rather than used the normal approximation they derive a limiting distribution for the t statistic, treating the cluster estimator as random variable Nevertheless this approach needs that the number of elements in each cluster is large relative to the number of clusters A third alternative, that could be more convenient when N and T are of the same size and are not large enough, is to impose more structure modeling the dependence in a parametric way The way we should model the dependence will depend on the specic application In particular, Hansen (2007) remarks that the bias of a simple parametric estimators is also typically smaller in the case where the parametric model is correct, making this approach preferable when the researcher is condent about the form of the error process 4 Parametric Approach: 41 The SAR model An alternative approach to deal with the error dependence is to model it in a parametric way, ie imposing some structure to the variance covariance matrix In a small sample framework, this methodology behaves better than clusters, since the variance estimator 5 will be NT consistent ARMA models are pretty known and used to model time dependence, however, models for spatial dependence are not so well known because cross sectional data do not have an order as opposed to time series data Notwithstanding, the interest and developments in spatial econometrics have been increasing in the last 10 years as a consequence of the increasing use of cross sectional data and panels with large N in applied econometrics In this way, models for specifying, estimating and testing spatial dependence have been developed in leading and pioneers works such Anselin (1988,1999), Baltagi et al (2003), 5 If the model is stationary 13

14 Baltagi et al (2010), Kelijian and Prucha(1998,1999), Lee and Yu (2007,2008),etc All of these papers, are based in the spatial autoregressive model (SAR) proposed by Cli and Ord (1973, 1981) whose specication is similar to the time series autoregressive model However, there are several important dierence between them In this section I will only comment three of the most important ones The spatial autoregresive model of order one -SAR(1)- for the error process of equation?? is the following: u it = λ N j=1 wijujt + εit Or in matrix form: U(t) = λw N U(t) + ɛ(t) (41) where U(t) is a Nx1 vector which contain each of the shocks u it for the N observations in time t λis the spatial parameter, ɛ(t) is a Nx1 vector of innovations and W N is a NxN matrix which is determined by the researcher before the estimation Each element w ij allows for direct correlation between two dierent cross sectional units In rst place, due to the fact that spatial units do not have an established order, there is no corresponding concept in the spatial domain, so, maintaining the idea that near realizations of a stochastic process are related, a spatial lag operator w ij is used to allow for correlation between units that are near in distance In this way a spatial unit is a function of a weighted average of other realizations of the same stochastic process at neighborhood locations So the weighted operator w ij or the weighted matrix W N which contains all the weights, plays an important role in modeling spatial dependence This weighting matrix has the following features: 14

15 (i) Each w ij is nonzero when the unit i and the unit j are not so far but tend to zero when the distance between units increases This assumption is crucial to have a nite variance covariance matrix for the asymptotic distribution of the OLS estimator when N goes to innity, like an ergodicity property for a time series process (ii) W N is a non stochastic matrix that has to be dened before the estimation and all the elements of its diagonal are equal to zero: w ij is equal to zero if i = j The rst assumption of this point is to avoid identications problems (is impossible to estimate the NxN elements of W N in a cross sectional setting The second assumption, is a normalization of the model and do not have implications for the estimation (iii) The matrix I N λw N is non singular for all values of λ, which ensures that the process U is invertible and can be expressed as a weighted average of the innovations ɛ(t) (iv) In most of the cases the matrix W N is row standardized in such a way that the sum of each row (column) is equal to one This is only for a fact of interpretation In second place, unlike time series models, the fact that λ is less than one in absolute value does not imply covariances stationary V ar(uu ) = σ 2 [(I N λw N ) (I N λw N )] 1 (42) The elements of the diagonal of V ar(uu ) in 42 are not necessary equal and depend on the structure of W N Moreover, a λ less than one is required to ensure the ergodicity of the process which implies a nite variance of U, λand ˆβ The third and most important dierence between the SAR(1) model in 41 and the AR(1) model, is that the OLS estimator of the autoregressive coecient in a spatial model is inconsistent Unlike time series models, the lag term in a spatial model is an endogenous variable which is correlated with 15

16 the innovations in ɛ(t), independently of the innovations being iid distributed We can see this in the reduced form U(t) = (I N λw N ) 1 ɛ(t), where each shock in the vector U(t) is a combination of all the innovations in the vector ɛ(t) Therefore, the endogeneity and bias of the OLS will depend on the structure of W N and the value of the spatial autoregressive parameter λ An alternative way to understand this problem is to think of a spatial model as a system of N equations, so the OLS will be biased and inconsistent due to the simultaneity bias As a consequence, the spatial lag variable has to be treated as an endogenous variable and the estimation method used for the spatial lag parameter has to exploit valid moment's conditions One way to estimate the spatial autoregressive coecient is via pseudo maximum likelihood assuming that ɛ(t) is normally distributed In this case we can treat U(t) as a multivariate random process which have an unconditional normal distribution: N(0, σ 2 [(I N λw N ) (I N λw N )] 1 ) lnl = N 2 (ln(2π) ln(σ2 )) + ln I N λw N 1 2σ 2 (U (I N λw N ) 1 U) 42 Cluster estimation as a exible model of spatial dependence Let's dene z it = x it u it for a scalar x it for the sake of simplicity And the N 1 vector Z t = [ z 1t z 2t z Nt ] ( 1 1 N T T t=1 z 2 1t + 1 T T t=1 Ŵ Cluster,t = 1 NT z 2 Nt + 1 T T t=1 T Z tz t = t=1 z 1t z Nt + 1 T T z N 1t z Nt + 1 T t=1 T t=1 z 2 Nt ) The cluster estimator robust to spatial dependence can be seen as the sum of each of the sample analogue estimators of the N (N + 1) /2 spatial covariances : 16

17 E ( z 1t) 2 = ρ1,1 E (z 1t z 2t ) = ρ 1,2 E (z 1t z Nt ) = ρ 1,N E ( znt) 2 = ρn,n In the cluster approach we are using the time series to estimate each of the previous moment conditions (Remember that the speed of convergence of the cluster estimator robust to spatial dependence is T ) Z t = Γ 1 /2 ɛ t where, Γ is an N N symmetric matrix that contains the N (N + 1)/2 spatial correlations and ɛ t is a N 1 vector of innovations with E [ɛ t ] = 0, E [ɛ t ɛ t] = I N We can express the model as: Z t = ΛZ t + ɛ t where Λ = I N Γ 1/2 is a NxN matrix The Cluster estimator could be expressed in terms of a particular fully exible spatial autoregressive model of order N (N + 1)/2 Z t = N 1 N i=1 j=i+1 λ ij W C ij Z t + ɛ t where each W C ij are zero λ ij is the ij element of Λ is a NxN matrix in which the element ij is equal to one and the other elements We can also capture the unrestricted dependence implied by the cluster model with other sequences of weighting matrices dierent from W C ij We can dene the following SAR(K) model: 17

18 K Z t = λ k W k Z t + ɛ t k=1 where K is a function of N and {W 1, W 2,, W K } is a sequence of weighting matrices We can mimic the cluster model and capture all the dependence implied by the matrices W C ij by saturating the SAR(K) with K = N(N +1)/2 and with suitable selection of {W 1, W 2,, W K } If K < N(N + 1)/2, but growth with N, when N, this model will be asymptotically equivalent to the cluster model (badly estimated for T not very large) If K is xed the estimator of the variance will learn from the time series and the cross section (the variance estimator will be NT consistent The SAR model of order one can be seen as a special case of the cluster estimator (K = 1) Z t = λw Z t + ɛ t In terms of the cluster, this model implies that the elements of Λ for states that do not share a border are zero and the elements of Λ for states that share a border are equal Note that all the elements in Z t could be correlated ( the states that share a border have a direct interaction while states that do not share a border could be connected indirectly through other common states) If we have a spatial model for the error term instead of a spatial model for z it : U t = λw U t + ξ t where U t is the N 1 vector of errors and ξ t is a N 1 vector of innovations with E (ξ t ) = 0 and E (ξ t ξ t) = σ ξ I N Z tz t = X tu t U tx t Z tz t = σ ξ X t [ (I λw ) (I λw ) ] 1 X t This implies a more restrictive model for Z t because we are assuming homoscedasticity E (U t U t X t ) = E (U t U t) 18

19 43 A Parametric Model for Spatial and Time Dependence: In this subsection I present a model which allows for all kind of dependence in a panel data model This model is a generalization of the SAR model described above but with two additional regressor for control the time dependence and the spatial dependence in dierent moments in time: U t = λ(i T W N )U t + ρ 1U t 1 + ρ 2(I T W N )U t 1 + ɛ t (43) Where U t is a NT x1 vector which contain each u it for the N states and T times, and U t 1 is the same vector but with a lag in the time period In this model, the presence of the term λ(i T W N )U t in 43generates, as in the SAR model, the endogeneity problem, so the OLS estimators remain inconsistent To estimate the unknown parameters of this model θ = [λ, ρ 1, ρ 2, σ 2 ] we can use a conditional pseudo maximum likelihood approach as if it were the reduced form of a VAR model 431 Quasi maximum Likelihood: Dening B N = I N λw N, A = ρ 1I N + ρ 2W Nand A 1 = B 1 A and assuming that BN is invertible we can express 43 in a reduced form : N U t = (I T A 1)U t 1 + (I T B 1 N )ɛ t (44) We know the two rst moment of the conditional distribution of U t/u t 1: 19

20 E[U t/u t 1] = I T A 1)U t 1 V AR[U t/u t 1] = σ 2 (I T (B N B N ) 1 Thus, in this case the conditional expectation of the score is zero and the properties of consistency and asymptotic normality of the estimator will be satised as shown in Yu and Lee Jong (2007) lnl(u(2)u(t ), θ) = NT 2 (ln(2π) ln(σ2 )) + (T 1)ln B N 1 2σ 2 [U t (I T 1 A 1 )U t 1 ] (I T 1 (B N B N ) 1 )[U t (I T 1 A 1 )U t 1 ]) where U t is a Nx(T 1) vector and I T 1is an identity matrix 432 Non stationary case One particularity of this spatial time model is that, even when the autoregresive coecient ρ 1 is less than one in absolute value, the vector U(t) could contain unit roots which implies that there may be non stationary components in the data generating process A non stationary case occurs when some of the eigenvalues of A 1 is equal to one As in Yu, Jong and Lee (2007) we can dene an eigenvalue i of the A 1 as ψ in = ρ 1+ρ 2 ϖ in 1+λϖ in, where ϖ in is the i eigenvalue of the matrix W N Moreover, if W N is row normalized from a symmetric matrix, all its eigenvalues will be greater or equal than one 6, then we could have unit roots in U(t) if λ + ρ 1 + ρ 2 = 1 Nevertheless, if the spatial weight matrix W N is row normalized from a symmetric matrix, we can still obtain the consistency and asymptotic normality of the ML and the QML estimator with the same rate of convergence as in the stationary case, as was demonstrated in Yu, Jong and Lee (2007) But obviously with a dierent variance covariance matrix 433 Monte Carlo Simulations: OLS vs QML In this subsection I simulate a spatial process like U(t) = λw N U(t) + ɛ(t) with N = 50 for dierent values of the spatial parameter λ The purpose of the simulation is to evaluate the performance of OLS 6 See Ord(1975) 20

21 relative to the conditional QML for a spatial model Then I extend the simulation for a spatial-time model as 43 for two dierent values of the parameters in θ I generate one thousand replications of a model with sample size N = 50 and T = 31, where ɛ(t) are generated from independent standard normal distributions The initial value U(1) is generate as N(0, I N ) For each set of generated sample observations I calculate the simulated sample mean and the simulated sample variance of the OLS and the conditional QML estimators This sample mean is 1 constructed as 1000 θ s=1 s and the sample variance as s=1 ( θ s θ 1000 s=1 s) 2 The results are summarized in Graph 1 and Graph 2 and Table N1 21

22 Graphic N1: U(t) = λw N U(t) + ɛ(t) The black line in the graph above shows the true values of the spatial parameter, while the blue line shows the average of the OLS estimator for a thousand replications Red dashed lines are the condence intervals calculated with two standard deviations above the mean From the graph we can see that the OLS estimator bias increases as the value of the spatial parameter increases 22

23 Graphic N2: U(t) = λw N U(t) + ɛ(t) On the other hand, the mean of the conditional QML estimator is very close to the actual values of the spatial parameter In addition, the actual values of the spatial parameter are within the condence interval of the estimator Table N1: U t = λ(i T W N )U t + ρ 1U t 1 + ρ 2(I T W N )U t 1 + ɛ t 23

24 From table N1 we can observe that the mean of the OLS estimator is far from the true value of the parameters representing the spatial dependence In that sense, the OLS estimator has a bias of 011 (two standard deviations) in the spatial parameter in relation to the bias of 002 of the conditional QML(1 standard deviation) This dierence in the biases increases when the true value of the spatial parameter is higher In this way, when the real value of the spatial parameter is 06 rather than 04, the OLS estimator bias is 021 (7 standard deviations) against 002 of QML estimator (2 standard deviations) The average of the OLS estimator of the parameter that represents the time dependence is very close to the true parameter, as the conditional QML estimator, since the time lag does not suer from endogeneity Finally, the OLS estimator of the parameter that measures the dependence between states at dierent moments in time is also biased, because the variable W N U t 1 is strongly correlated with the variable W N U t which has the endogeneity Thus, the higher the endogenity of W N U t is (which depends on the structure of W N and the value of λ) the higher the bias of the OLS estimator of the parameter related to the variable W N U t 1 is, as shown in the table 5 Montecarlo Simulation: The purpose of this section is twofold The rst one is to study the behavior of the two way clustering approach and the parametric approach in a setup where we have spatial and time dependence in the error term of the model and NT is large but N and T separately are not The second one is study and model the error dependence in a specic application and analyze how the results and conclusions could change once we control for several forms of dependence In order to tackle this two purposes, I will focus on a recent paper that studies the causal eect of the minimum wage on US wage inequality, written by Autor, Manning and Smith, from now on, AMS This research uses a state panel data with N = 50 and T = 30 By using OLS and 2SLS estimators and xed and time eects as controls, the authors conclude that the impact of minimum wage over wage inequality in the period is highly signicant I argue that their inferences are not so reliable since they are not controlling for error dependence in a setting where it is very likely In fact, Barrios et al (2010) show that yearly earnings have substantial correlation across states which 24

25 decreases when distance increases This dependence could be explained by geographical or local labor markets features In subsection 31, I briey explain the model in AMS In section 32, I discuss how to estimate a spatial and time autoregresive model to the error term of AMS model and I propose a non linear least square estimation that is consistent and computationally ecient under the presence of xed and time eects in middle panels In section 33, I estimate the parameters of the spatial and time autoregresive model to the error term of AMS In section 34, I generate replications of data with the same characteristics as the AMS's error model in order to evaluate how many times we make type one errors in a regression of a variable with AMS's error structure and the regressors of AMS model (eective minimum wage and its square), for dierent estimators of the variance covariance matrix 51 Autor, Manning and Smith (AMS) In their paper, AMS uses a Panel data of 30 years and 50 states to measure the impact of minimum wage over US wage inequality Their conclusions are based on the estimation of the following regression model: logwage it (10) logwage it (50) = logwage it (10) logwage it (50) + β 1 [logwage Min it logwage it (50)] + β 2 [logwage Min it logwage it (50)] 2 + u it (51) Where wage it(10) wage it(50) is the log of the10 th percentile state wage relative to the log of the median state wage (a proxy of wage inequality) wage it(10) wage it(50) is the latent inequality which is approximated by the sum of a xed and a time eect 7 and [wage Min it wage it(50)] is a measure of the bindingness of the minimum wage for state s in year t, from know on, eective minimum wage Therefore, the estimated marginal eect of the minimum wage over wage inequality is given by the 1 estimation of β 1 + 2β N T 2 NT i=1 t=1 [logwagemin it logwage it(50)] AMS remarks that the estimation of 7 See Autor, Manning and Smith(2010) 25

26 β 1 and β 2 in 51 could be aected by division bias because the variable logwage it(50) appears in both sides of the regression, which induces an articial positive correlation caused by sampling variation For such reason, they use the maximum value between the federal minimum regulatory wage and the state's minimum wage, from know on statutory minimum, as an instrument The idea is that the federal regulatory wage could aect directly the logwage Min it but not logwage it(50) It is reasonable to think that the federal regulatory wage is an exogenous source of variation that is not directly correlated with the state variation of the wage inequality To instrument the square of the eective minimum, they used the square of the predicted value from a regression of the eective minimum on the statutory minimum using year and states dummies as controls In the AMS model, there may be some shocks that we are not controlling for(apart from the xed and the time eect) For example, we could think on a productivity shock that could aect the inequality in each state and in each time As in Beaudry et al(2010), technological shocks could increase the return to high skill workers relative to low skill workers, increasing the inequality in a specic state These productivity shocks could create an error dependence in the model in three dierent dimensions: (i) As in macro models, productivity shocks could follow an autoregressive process, inducing a time dependence in the error term (ii)state specic productivity shocks that are propagated to other states through some linkages across states For example, productivity shocks could quickly propagated between neighboring states and (iii) productivity shocks could generate some spillovers between states at dierent points in time, ie the new technology is transmitted from one state to another state across time 52 Estimation of a Parametric Model for Spatial and Time Dependence in the residual of AMS's model As a rst step I estimated AMS's model in 51 by OLS and 2SLS controlling for time and state eects in order to obtained the residuals As in Autor et al (2010), In order to built the dependent variable, I followed the following steps : (i) First, I grouped all individual responses from the Current Population Survey Merged Outgoing Rotation Group (CPS MORG) for each year (ii) I used the 26

27 reported hourly wage for those who reported being paid by the hour, and, if the individual do not have information of hourly wage, I calculated this variable as weekly earnings divided by hours worked in the prior week (iii) Then, I limited the sample to individuals which has more than 18 year and less than 64 and exclude self employed individuals (v) Finally, in order to reduce the inuence of outliers I replaced the percentile 98 and 99 of the wage distribution in each state and year by the 97 percentile value Using these individual wage data, I computed the percentiles of the state wage distributions by sex for , weighting individual observations by their CPS sampling For the minimum wage and the federal wage I used the data reported in table 1 of the Appendix of Autor et al (2010) The estimation of the residual of the model is given by the following expression: û it = logwage it (10) logwage it (50) ˆδ t ˆδ i ˆβ 1 [logwage Min it logwage it (50)] ˆβ 2 [logwage Min it logwage it (50)] 2 (52) Using these residuals I estimated a Spatial Time model as 43 in order to get an estimation of the parameters which characterize the error dependence of AMS model W N is a normalized matrix (by rows) from a symmetric matrix that takes the value of one if the states share a border and zero otherwise The parameters in θ = [λ, ρ 1, ρ 2, σ 2 ] estimated for the residual term model proposed are obtained both by OLS and conditional QML (residuals come from the estimated errors of the OLS and 2SLS model of AMS) are summarized in Table 2 and Table 3; 27

28 Table N 2 Table N3 As we can see from Table N2 and Table N3, the OLS and conditional QML estimators of the three parameters are highly signicant with a spatial parameter near to 02 (for the QML estimator) Moreover the R-squared of the regressions are around 05 and the Likelihood Ratio test for testing the three forms of autocorrelation, reject the null hypothesis of absence of autocorrelation In this setting NxT is large, but actually N and T separately are not, so, it is possible that the residuals estimated from AMS model could be biased, since they come from the estimation of the xed eects and the time eects which are biased for small T and small N respectively To avoid the bias caused by the xed and time eects estimations on the residual regression, I eliminate these eects deviating the variables of the AMS model with respect to its time and state means In this sense, I transform the data with Q 1 and Q 2 in order to eliminate the time and individual eects of the regression Where Q 1 = I N (I T 1 ıt ıt ) is the matrix which deviates the variables with T respect to its time mean, and Q 2 = (I N 1 ın ın ) IT ) is the matrix which deviates the variables with N respect to its state mean Therefore, I should estimate a Spatial-Time model to the new transformed residuals: U t = λ[i T W N ]U t + ρ 1U t 1 + ρ 2[I T W N ]U t 1 + ɛ t (53) 28

29 where U = Q 2Q 1U and ɛ = Q 2Q 1ɛ However, this transformation creates other sources of bias in the model, because E[ɛ i /U t 1] is not zero Therefore, the moments used in the conditional QML for the reduced form of the model will not be valid anymore One way to estimate this model, is to use the unconditional likelihood of U : lnl = NT 2 (ln(2π) ln(σ2 )) + ln (Q 2Q 1ΥQ 2Q 1) 1/2 1 2σ 2 (U(Q 2Q 1ΥQ Q 1) 1 U) where Υ is the variance covariance matrix of the NT vector U: Υ = E u 1t u 1t u 1t u 2t u 1t u Nt u 2t u 1t u 2t u 2t u 2t u Nt u Nt u 1t u Nt u 2t u Nt u Nt u 1t+1 u 1t u 1t+1 u 2t u 1t+1 u Nt u 2t+1 u 1t u 2t+1 u 2t u 2t+1 u Nt u Nt+1 u 1t u Nt+1 u 2t u Nt+1 u Nt u 1T u 1t u 1T u 2t u 1T u Nt u 2T u 1t u 2T u 2t u 2T u Nt u 1t u 1T u 1t u T u 1t u NT u 2t u 1T u 2t u 2T u 2t u NT u Nt u 1T u Nt u 2T u Nt u NT u 1T u 1T u 1T u 2T u 1T u NT u 2T u 1T u 2T u 2T u 2T u NT u NT u 1t u NT u 2t u NT u Nt u NT u 1T u NT u 2T u NT u NT A practical diculty with this unconditional QML is that the estimation of the parameters in θ = [λ, ρ 1, ρ 2, σ 2 ] implies computational complexities This is because the computational estimation of the likelihood in 42, involves a repeated evaluation of the determinant of the NT xnt matrix Υ For this reason, I moved to other methodology which exploit valid moment conditions but is computationally easy 521 Non Linear Least Squares Another way to estimate the parameters in θ without imposing any functional distribution for the vector ɛ(t) is to use all the moment conditions inside Υ = E(U tu t) 29

30 For simplicity I will dene the vector U(t) as the vector of shocks for all the states in time t Therefore, for each time t the parametric model for the error term of AMS will be the following: U(t) = λw N U(t) + ρ 1U(t 1) + ρ 2W N U(t 1) + ɛ(t) or in reduced form: U(t) = A 1U(t 1) + B 1 N ɛ(t) (54) Υ = E U(t)U(t) U(t)U(t + 1) U(t)U(T ) U(t + 1)U(t) U(t + 1)U(t + 1) U(t + 1)U(T ) U(T )U(t) U(T )U(t + 1) U(T )U(T ) This Υ can be expressed as a function of the parameters in θ A 1 is not necessary symmetric and especially does not have all its eigenvalues less than one, even when the U it is time-stationary Therefore each sub matrix of Υ, could depend on the dierence in time We can expressed 54 in its MA form using the lag operator: U(t) = B 1 ɛ(t) + A1B 1 ɛ(t 1) + A2 1B 1 ɛ(t 2) + A3 1B 1 ɛ(t 3) + N In this way if we have the index t = 1, 2, 3T : N N N E[U(1)U(1) ] =(B N B N ) 1 σ 2 I T, E[U(1)U(2) ] =(B N B N ) 1 σ 2 A 1, E[U(1)U(3) ] =(B N B N ) 1 σ 2 A 2 1 ', E[U(1)U(T ) ] = (B N B N ) 1 σ 2 A T 1 1 ' E[U(2)U(1) ] =(B N B N ) 1 σ 2 A 1, E[U(2)U(2) ] = (B N B N ) 1 σ 2 [I N + A 1A 1 ], E[U(2)U(3) ] = (B N B N ) 1 σ 2 [A 1 + A1A2 1 ], E[U(2)U(T ) ] = (B N B N ) 1 σ 2 [A T 2ı 1 + A 1A T 1 1 ] 30

31 E[U(3)U(1) ] = (B N B N ) 1 σ 2 [A 2 1 ], E[U(3)U(2) ] = (B N B N ) 1 σ 2 [A 1 + A 2 1 A 1 ], E[U(3)U(3) ] = (B N B N ) 1 σ 2 [I N + A 1A 1 + A2 1 A2 1 ], E[U(3)U(T ) ] = (B N B N ) 1 σ 2 [A T A 1A T A 2 1 1AT 1 ] E[U(1)U(T ) ] = (B N B N ) 1 σ 2 A T 1 1,E[U(T )U(2) ] = (B N B N ) 1 σ 2 [A T 2 1 A T A 1 ], E[U(T )U(3) ] = (B N B N ) 1 σ 2 [A T 3 1 +A T 2 1 A 1 1 +AT 1 A 2 1 ], E[U(T )U(T ) ] = (B N B N ) 1 σ 2 [I N +A 1A AT 1 A T 1 1 ] Using the residuals that come from the transformation of AMS model, we can minimize the quadratic distance between all the moment conditions inside Q 2Q 1Υ(θ)Q 1Q 2 and the sample counterpart of E[ ˆ U ˆ U ] Dene ζ p = vech( ˆ U ˆ U ), and ξ p(θ) = vech(q 2Q 1Υ(θ)Q 1Q 2), we can estimate θas the nonlinear least squares from the following model: ζ p = ξ p(θ) + υ p where p = NT (NT +1) 2 θ NLS = argmin{[ζ p ξ p(θ)] [ζ p ξ p(θ)]} c This estimator will be consistent but inecient because we are not using a correct weighting matrix for the moments in Υ(θ) 8 To compute standard errors for this θ NLS, I used a Monte Carlo simulation generating replications of the following model: U = λ NLS [I T W N ]U NLS + ρ 1 Uit 1 NLS + ρ 2 [I T W N ]Uit 1 + ɛ it For each simulation I generate a NT vector ɛ from a Normal distribution N(0, σ NLS Q 2Q 1Q 1Q 2) The results of the OLS and NLS estimations of the Spatial Time model for the OLS transformed residuals and 2SLS transformed residuals are summarized in Table 4 and Table 5: 8 For simplicity I am using the Identity Matrix as a weighting matrix instead V AR(ζ p) For use the correct matrix I should know the four moments of U it 31

32 Table N4 Table N5 The results do not change drastically, all the estimations of the 3 parameters are still highly signicant, but now the estimation of the spatial parameter is higher (around 04) and the estimation of the lag spatial parameter (around 01) is lower as compared to the conditional QML estimator of the residuals without deviations 53 Simulating AMS model with error dependence Using these last estimated values of the parameters that characterize the dependence, I will simulate one thousand replications of shocks with time and spatial dependence: U = λ NLS [I T W N ]U NLS + ρ 1 Uit 1 NLS + ρ 2 [I T W N ]Uit 1 + ɛ it For each simulation the NT vector ɛ was created from a Normal distribution N(0, σ NLS Q 2Q 1Q 1Q 2) Then, using the real regressors from the AMS model (eective minimum wage and its square) I simulate the dependent variable of each replication: 32

33 y J = β 1Q 2Q 1[logwage Min logwage(50)] + β 2Q 2Q 1[logwage Min logwage(50)] 2 + U J, where J = 1 to 1000 The values of β 1 and β 2 come from the estimation of the AMS model Finally, I compute for each replication the OLS estimators of a regression of y J over Q 2Q 1[logwage Min logwage(50)] and Q 2Q 1[logwage Min logwage(50)] 2 and the dierent variance estimator of the OLS estimator using each of the approaches (Non Parametric and Parametric) individual tests, how many times we reject the true null hypothesis of of condence level for each of the estimated variances (Non Parametric and Parametric) Then, I evaluate, using OLS OLS ˆβ 1 = β 1, ˆβ 2 = β 2 with a 5% 6 Results 61 Simulation Exercise The results of the simulation are summarized in Table N8 We can observe that the type one error is around 70% for both estimators if we do not control for dependence in a setting in which it exists Moreover, Table N8 shows that the type one error using the cluster-variance is far away from the nominal size, even when multiway clusters, which control for all forms of dependence, are used This is because these estimators do not behave correctly in small samples of N and T For example, the one type error of the state clusters, which control for time dependence, is around 225% and 244% for each beta, respectively These higher probabilities of rejecting the true null hypothesis can be explained by the following two reasons: (i) This Variance estimator does not take into account the current and the lag spatial dependence, (ii) the number of state clusters are not enough 33

34 Table N8: Probability of reject the null hyphotesis when is true In the case of the time clusters, which control for spatial dependence, the type one error for the betas are signicantly higher (more that the double) than the state cluster This could occur because this variance estimator does not take into account neither the dependence across time nor the dependence across states at dierent moments in time And also we must notice that the number of clusters by time are signicantly smaller than the clusters by individuals (31 vs 50), so the behavior of this variance estimator is poorly However, it is remarkable that the two way cluster and the multiway clustering approach which controls for all kind of dependence (despite they behave better than the one way cluster variances), still has a high probability of rejecting the true null hypothesis (around 20%) These results reinforce the importance of having large samples for N and T, when we want to use the clustering approach to control for general forms of dependence Therefore, the Non Parametric approach does not behave well in small samples as we have already discussed in the methodology Finally, the parametric variance estimator is the only one which almost attained the real nominal size of the test for both betas The main conclusions that emerge from the simulation exercise are the following: (i) Ignoring error dependence in a setting were exists, will lead to higher rates of rejection of the true hypothesis (accepting spurious regressors when there is not a causal relationship) (ii) Using Non Parametric clustering approach to deal with error dependence in a setting where the number of clusters are not 34

35 so large, leads to a bad approximation to the variance covariance matrix, as it is reected with a signicantly high type one error relative to the nominal size of the test ( because this estimators need N and T go to innity) (iii) In small samples, using the parametric approach is a better way to control for the dependence, in case the researcher is condent about the parametric model 62 Robust Standard Errors for AMS model In this subsection, I re-estimate the model of AMS with the intention of computing standard errors using the methodologies previously discussed in this article (one-way cluster, two-way cluster, multiway cluster and parametric approach) Then, we can analyze how results change once robust standard errors were used in a context where there is error dependence in more than one dimension In the following subsections I will discuss separately the variance estimates for marginal eects arising from: (i) OLS estimate of the US state inequality over the state eective minimum and its square, plus xed eects and time eects as controls (ii) 2SLS estimation after instrumenting both, the state eective minimum wage and the square of the state eective minimum wage The instruments are the statutory minimum and the square of the predicted value from a regression between the state eective minimum wage and the statutory minimum plus xed and time eects, respectively 35

36 Table N9: Dierent variance estimators of the OLS estimator of the parameteres of AMS model As can be seen in Table 9, the standard errors for the OLS estimates of the minimum wage and its square increase approximately in 200% when we switch from non-cluster to state cluster or time cluster When we only cluster by one dimension, either by state or by time, the OLS estimator of the minimum wage variable remains signicant at the 1% condence level However, the OLS estimator of the square of the minimum wage reduces its individual signicance at 5% level with respect to the non-cluster case However, when we use variance estimators that take into account the dependence on more than one dimension, the OLS estimator of the square of the eective minimum wage loses signicance at all levels, while the OLS estimator of the minimum wage remains signicant, but this time at the 5% for the multiway cluster and at the 10% using the parametric variance Given, that we are interested in the total marginal eect of the eective minimum rather than each of its eect separately ( ˆβ 1, ˆβ 2 ), and because the standard errors could be higher due to a multicolinearity between the eective minimum wage and its square, I present in Table N10, the result of the estimated average marginal eect of the minimum wage over US State inequality, which is given by 36

37 ˆβ ˆβ 1 N T 2 NT i=1 t=1 [logwagemin it logwage it(50)], 9 Table N10: Dierent variance estimators of the OLS estimator of de marginal eect of AMS model The standard error of the estimated marginal eect, increase in 200% when we used a one wayclustering(either by time or space) and 300% when we used a multiway-clustering approach, rather than without clusterdespite this fact, the estimated marginal eect still remains signicant at the 5% level Finally, when we use the parametric variance to control for the dependence, the marginal eect of the minimum wage over US State inequality loses signicance As AMS stress, the OLS estimator of the model could be aected by a division bias problem, because the variable logwage it(50) appears in both sides of the regression Therefore, the 2SLS estimation is more appropriate for study the eect of the minimum wage over US state inequality Table N11 presents the results for the 2SLS estimation 9 In Autor et al (2010), only present the results for the estimated marginal eect 37

Robust Standard Errors to spatial and time dependence in. state-year panels. Lucciano Villacorta Gonzales. June 24, Abstract

Robust Standard Errors to spatial and time dependence in. state-year panels. Lucciano Villacorta Gonzales. June 24, Abstract Robust Standard Errors to spatial and time dependence in state-year panels Lucciano Villacorta Gonzales June 24, 2013 Abstract There are several settings where panel data models present time and spatial

More information

Robust Standard Errors to Spatial and Time Dependence in Aggregate Panel Models

Robust Standard Errors to Spatial and Time Dependence in Aggregate Panel Models Robust Standard Errors to Spatial and Time Dependence in Aggregate Panel Models Lucciano Villacorta February 17, 2017 Abstract This paper studies alternative approaches to consider time and spatial dependence

More information

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over

More information

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation Inference about Clustering and Parametric Assumptions in Covariance Matrix Estimation Mikko Packalen y Tony Wirjanto z 26 November 2010 Abstract Selecting an estimator for the variance covariance matrix

More information

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Tak Wai Chau February 20, 2014 Abstract This paper investigates the nite sample performance of a minimum distance estimator

More information

GMM estimation of spatial panels

GMM estimation of spatial panels MRA Munich ersonal ReEc Archive GMM estimation of spatial panels Francesco Moscone and Elisa Tosetti Brunel University 7. April 009 Online at http://mpra.ub.uni-muenchen.de/637/ MRA aper No. 637, posted

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

1 Estimation of Persistent Dynamic Panel Data. Motivation

1 Estimation of Persistent Dynamic Panel Data. Motivation 1 Estimation of Persistent Dynamic Panel Data. Motivation Consider the following Dynamic Panel Data (DPD) model y it = y it 1 ρ + x it β + µ i + v it (1.1) with i = {1, 2,..., N} denoting the individual

More information

Testing Random Effects in Two-Way Spatial Panel Data Models

Testing Random Effects in Two-Way Spatial Panel Data Models Testing Random Effects in Two-Way Spatial Panel Data Models Nicolas Debarsy May 27, 2010 Abstract This paper proposes an alternative testing procedure to the Hausman test statistic to help the applied

More information

Advanced Econometrics

Advanced Econometrics Based on the textbook by Verbeek: A Guide to Modern Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna May 16, 2013 Outline Univariate

More information

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35 Rewrap ECON 4135 November 18, 2011 () Rewrap ECON 4135 November 18, 2011 1 / 35 What should you now know? 1 What is econometrics? 2 Fundamental regression analysis 1 Bivariate regression 2 Multivariate

More information

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails GMM-based inference in the AR() panel data model for parameter values where local identi cation fails Edith Madsen entre for Applied Microeconometrics (AM) Department of Economics, University of openhagen,

More information

Spatial Econometrics

Spatial Econometrics Spatial Econometrics Lecture 5: Single-source model of spatial regression. Combining GIS and regional analysis (5) Spatial Econometrics 1 / 47 Outline 1 Linear model vs SAR/SLM (Spatial Lag) Linear model

More information

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e., 1 Motivation Auto correlation 2 Autocorrelation occurs when what happens today has an impact on what happens tomorrow, and perhaps further into the future This is a phenomena mainly found in time-series

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 4 Jakub Mućk Econometrics of Panel Data Meeting # 4 1 / 30 Outline 1 Two-way Error Component Model Fixed effects model Random effects model 2 Non-spherical

More information

Casuality and Programme Evaluation

Casuality and Programme Evaluation Casuality and Programme Evaluation Lecture V: Difference-in-Differences II Dr Martin Karlsson University of Duisburg-Essen Summer Semester 2017 M Karlsson (University of Duisburg-Essen) Casuality and Programme

More information

Dealing With Endogeneity

Dealing With Endogeneity Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

Outline. Overview of Issues. Spatial Regression. Luc Anselin

Outline. Overview of Issues. Spatial Regression. Luc Anselin Spatial Regression Luc Anselin University of Illinois, Urbana-Champaign http://www.spacestat.com Outline Overview of Issues Spatial Regression Specifications Space-Time Models Spatial Latent Variable Models

More information

Lecture 6: Dynamic panel models 1

Lecture 6: Dynamic panel models 1 Lecture 6: Dynamic panel models 1 Ragnar Nymoen Department of Economics, UiO 16 February 2010 Main issues and references Pre-determinedness and endogeneity of lagged regressors in FE model, and RE model

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Regression with time series

Regression with time series Regression with time series Class Notes Manuel Arellano February 22, 2018 1 Classical regression model with time series Model and assumptions The basic assumption is E y t x 1,, x T = E y t x t = x tβ

More information

Spatial Regression. 11. Spatial Two Stage Least Squares. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Regression. 11. Spatial Two Stage Least Squares. Luc Anselin.  Copyright 2017 by Luc Anselin, All Rights Reserved Spatial Regression 11. Spatial Two Stage Least Squares Luc Anselin http://spatial.uchicago.edu 1 endogeneity and instruments spatial 2SLS best and optimal estimators HAC standard errors 2 Endogeneity and

More information

Some Recent Developments in Spatial Panel Data Models

Some Recent Developments in Spatial Panel Data Models Some Recent Developments in Spatial Panel Data Models Lung-fei Lee Department of Economics Ohio State University l ee@econ.ohio-state.edu Jihai Yu Department of Economics University of Kentucky jihai.yu@uky.edu

More information

The exact bias of S 2 in linear panel regressions with spatial autocorrelation SFB 823. Discussion Paper. Christoph Hanck, Walter Krämer

The exact bias of S 2 in linear panel regressions with spatial autocorrelation SFB 823. Discussion Paper. Christoph Hanck, Walter Krämer SFB 83 The exact bias of S in linear panel regressions with spatial autocorrelation Discussion Paper Christoph Hanck, Walter Krämer Nr. 8/00 The exact bias of S in linear panel regressions with spatial

More information

Final Exam. Economics 835: Econometrics. Fall 2010

Final Exam. Economics 835: Econometrics. Fall 2010 Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each

More information

A Course on Advanced Econometrics

A Course on Advanced Econometrics A Course on Advanced Econometrics Yongmiao Hong The Ernest S. Liu Professor of Economics & International Studies Cornell University Course Introduction: Modern economies are full of uncertainties and risk.

More information

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure A Robust Approach to Estimating Production Functions: Replication of the ACF procedure Kyoo il Kim Michigan State University Yao Luo University of Toronto Yingjun Su IESR, Jinan University August 2018

More information

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63 1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS17/18 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

Wild Bootstrap Inference for Wildly Dierent Cluster Sizes

Wild Bootstrap Inference for Wildly Dierent Cluster Sizes Wild Bootstrap Inference for Wildly Dierent Cluster Sizes Matthew D. Webb October 9, 2013 Introduction This paper is joint with: Contributions: James G. MacKinnon Department of Economics Queen's University

More information

ECON 4160, Lecture 11 and 12

ECON 4160, Lecture 11 and 12 ECON 4160, 2016. Lecture 11 and 12 Co-integration Ragnar Nymoen Department of Economics 9 November 2017 1 / 43 Introduction I So far we have considered: Stationary VAR ( no unit roots ) Standard inference

More information

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic Chapter 6 ESTIMATION OF THE LONG-RUN COVARIANCE MATRIX An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic standard errors for the OLS and linear IV estimators presented

More information

Short T Panels - Review

Short T Panels - Review Short T Panels - Review We have looked at methods for estimating parameters on time-varying explanatory variables consistently in panels with many cross-section observation units but a small number of

More information

dqd: A command for treatment effect estimation under alternative assumptions

dqd: A command for treatment effect estimation under alternative assumptions UC3M Working Papers Economics 14-07 April 2014 ISSN 2340-5031 Departamento de Economía Universidad Carlos III de Madrid Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 916249875 dqd: A command for treatment

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria ECOOMETRICS II (ECO 24S) University of Toronto. Department of Economics. Winter 26 Instructor: Victor Aguirregabiria FIAL EAM. Thursday, April 4, 26. From 9:am-2:pm (3 hours) ISTRUCTIOS: - This is a closed-book

More information

HETEROSKEDASTICITY, TEMPORAL AND SPATIAL CORRELATION MATTER

HETEROSKEDASTICITY, TEMPORAL AND SPATIAL CORRELATION MATTER ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume LXI 239 Number 7, 2013 http://dx.doi.org/10.11118/actaun201361072151 HETEROSKEDASTICITY, TEMPORAL AND SPATIAL CORRELATION MATTER

More information

Multiple Equation GMM with Common Coefficients: Panel Data

Multiple Equation GMM with Common Coefficients: Panel Data Multiple Equation GMM with Common Coefficients: Panel Data Eric Zivot Winter 2013 Multi-equation GMM with common coefficients Example (panel wage equation) 69 = + 69 + + 69 + 1 80 = + 80 + + 80 + 2 Note:

More information

A SPATIAL CLIFF-ORD-TYPE MODEL WITH HETEROSKEDASTIC INNOVATIONS: SMALL AND LARGE SAMPLE RESULTS

A SPATIAL CLIFF-ORD-TYPE MODEL WITH HETEROSKEDASTIC INNOVATIONS: SMALL AND LARGE SAMPLE RESULTS JOURNAL OF REGIONAL SCIENCE, VOL. 50, NO. 2, 2010, pp. 592 614 A SPATIAL CLIFF-ORD-TYPE MODEL WITH HETEROSKEDASTIC INNOVATIONS: SMALL AND LARGE SAMPLE RESULTS Irani Arraiz Inter-American Development Bank,

More information

Notes on Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market

Notes on Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market Notes on Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market Heckman and Sedlacek, JPE 1985, 93(6), 1077-1125 James Heckman University of Chicago

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Chapter 2. Dynamic panel data models School of Economics and Management - University of Geneva Christophe Hurlin, Université of Orléans University of Orléans April 2018 C. Hurlin (University of Orléans)

More information

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED by W. Robert Reed Department of Economics and Finance University of Canterbury, New Zealand Email: bob.reed@canterbury.ac.nz

More information

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Matthew Harding and Carlos Lamarche January 12, 2011 Abstract We propose a method for estimating

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Interpreting Regression Results

Interpreting Regression Results Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 42 Interpreting Regression Results Interpreting regression results is not a simple exercise. We propose to split

More information

Econ 582 Fixed Effects Estimation of Panel Data

Econ 582 Fixed Effects Estimation of Panel Data Econ 582 Fixed Effects Estimation of Panel Data Eric Zivot May 28, 2012 Panel Data Framework = x 0 β + = 1 (individuals); =1 (time periods) y 1 = X β ( ) ( 1) + ε Main question: Is x uncorrelated with?

More information

Lecture 6: Hypothesis Testing

Lecture 6: Hypothesis Testing Lecture 6: Hypothesis Testing Mauricio Sarrias Universidad Católica del Norte November 6, 2017 1 Moran s I Statistic Mandatory Reading Moran s I based on Cliff and Ord (1972) Kelijan and Prucha (2001)

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 1 Jakub Mućk Econometrics of Panel Data Meeting # 1 1 / 31 Outline 1 Course outline 2 Panel data Advantages of Panel Data Limitations of Panel Data 3 Pooled

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid Applied Economics Panel Data Department of Economics Universidad Carlos III de Madrid See also Wooldridge (chapter 13), and Stock and Watson (chapter 10) 1 / 38 Panel Data vs Repeated Cross-sections In

More information

Appendix A: The time series behavior of employment growth

Appendix A: The time series behavior of employment growth Unpublished appendices from The Relationship between Firm Size and Firm Growth in the U.S. Manufacturing Sector Bronwyn H. Hall Journal of Industrial Economics 35 (June 987): 583-606. Appendix A: The time

More information

On the Power of Tests for Regime Switching

On the Power of Tests for Regime Switching On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating

More information

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc. DSGE Methods Estimation of DSGE models: GMM and Indirect Inference Willi Mutschler, M.Sc. Institute of Econometrics and Economic Statistics University of Münster willi.mutschler@wiwi.uni-muenster.de Summer

More information

ECON 616: Lecture Two: Deterministic Trends, Nonstationary Processes

ECON 616: Lecture Two: Deterministic Trends, Nonstationary Processes ECON 616: Lecture Two: Deterministic Trends, Nonstationary Processes ED HERBST September 11, 2017 Background Hamilton, chapters 15-16 Trends vs Cycles A commond decomposition of macroeconomic time series

More information

VAR Models and Applications

VAR Models and Applications VAR Models and Applications Laurent Ferrara 1 1 University of Paris West M2 EIPMC Oct. 2016 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner Econometrics II Nonstandard Standard Error Issues: A Guide for the Practitioner Måns Söderbom 10 May 2011 Department of Economics, University of Gothenburg. Email: mans.soderbom@economics.gu.se. Web: www.economics.gu.se/soderbom,

More information

Lecture 1: OLS derivations and inference

Lecture 1: OLS derivations and inference Lecture 1: OLS derivations and inference Econometric Methods Warsaw School of Economics (1) OLS 1 / 43 Outline 1 Introduction Course information Econometrics: a reminder Preliminary data exploration 2

More information

ECON 4160, Spring term Lecture 12

ECON 4160, Spring term Lecture 12 ECON 4160, Spring term 2013. Lecture 12 Non-stationarity and co-integration 2/2 Ragnar Nymoen Department of Economics 13 Nov 2013 1 / 53 Introduction I So far we have considered: Stationary VAR, with deterministic

More information

Efficiency of repeated-cross-section estimators in fixed-effects models

Efficiency of repeated-cross-section estimators in fixed-effects models Efficiency of repeated-cross-section estimators in fixed-effects models Montezuma Dumangane and Nicoletta Rosati CEMAPRE and ISEG-UTL January 2009 Abstract PRELIMINARY AND INCOMPLETE Exploiting across

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University

More information

Vector Auto-Regressive Models

Vector Auto-Regressive Models Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

Economics 582 Random Effects Estimation

Economics 582 Random Effects Estimation Economics 582 Random Effects Estimation Eric Zivot May 29, 2013 Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Non-linear panel data modeling

Non-linear panel data modeling Non-linear panel data modeling Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini May 2010 Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1

More information

Lecture 7: Dynamic panel models 2

Lecture 7: Dynamic panel models 2 Lecture 7: Dynamic panel models 2 Ragnar Nymoen Department of Economics, UiO 25 February 2010 Main issues and references The Arellano and Bond method for GMM estimation of dynamic panel data models A stepwise

More information

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity John C. Chao, Department of Economics, University of Maryland, chao@econ.umd.edu. Jerry A. Hausman, Department of Economics,

More information

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance CESIS Electronic Working Paper Series Paper No. 223 A Bootstrap Test for Causality with Endogenous Lag Length Choice - theory and application in finance R. Scott Hacker and Abdulnasser Hatemi-J April 200

More information

Economics 241B Estimation with Instruments

Economics 241B Estimation with Instruments Economics 241B Estimation with Instruments Measurement Error Measurement error is de ned as the error resulting from the measurement of a variable. At some level, every variable is measured with error.

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

Comments on: Panel Data Analysis Advantages and Challenges. Manuel Arellano CEMFI, Madrid November 2006

Comments on: Panel Data Analysis Advantages and Challenges. Manuel Arellano CEMFI, Madrid November 2006 Comments on: Panel Data Analysis Advantages and Challenges Manuel Arellano CEMFI, Madrid November 2006 This paper provides an impressive, yet compact and easily accessible review of the econometric literature

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible

More information

GMM based inference for panel data models

GMM based inference for panel data models GMM based inference for panel data models Maurice J.G. Bun and Frank Kleibergen y this version: 24 February 2010 JEL-code: C13; C23 Keywords: dynamic panel data model, Generalized Method of Moments, weak

More information

Missing dependent variables in panel data models

Missing dependent variables in panel data models Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units

More information

Cointegration Lecture I: Introduction

Cointegration Lecture I: Introduction 1 Cointegration Lecture I: Introduction Julia Giese Nuffield College julia.giese@economics.ox.ac.uk Hilary Term 2008 2 Outline Introduction Estimation of unrestricted VAR Non-stationarity Deterministic

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Generalized Method of Moments (GMM) Estimation

Generalized Method of Moments (GMM) Estimation Econometrics 2 Fall 2004 Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen of29 Outline of the Lecture () Introduction. (2) Moment conditions and methods of moments (MM) estimation. Ordinary

More information

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p GMM and SMM Some useful references: 1. Hansen, L. 1982. Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p. 1029-54. 2. Lee, B.S. and B. Ingram. 1991 Simulation estimation

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

ECONOMETRICS. Bruce E. Hansen. c2000, 2001, 2002, 2003, University of Wisconsin

ECONOMETRICS. Bruce E. Hansen. c2000, 2001, 2002, 2003, University of Wisconsin ECONOMETRICS Bruce E. Hansen c2000, 200, 2002, 2003, 2004 University of Wisconsin www.ssc.wisc.edu/~bhansen Revised: January 2004 Comments Welcome This manuscript may be printed and reproduced for individual

More information

Lecture 4: Heteroskedasticity

Lecture 4: Heteroskedasticity Lecture 4: Heteroskedasticity Econometric Methods Warsaw School of Economics (4) Heteroskedasticity 1 / 24 Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity White Goldfeld-Quandt Breusch-Pagan

More information

DSGE-Models. Limited Information Estimation General Method of Moments and Indirect Inference

DSGE-Models. Limited Information Estimation General Method of Moments and Indirect Inference DSGE-Models General Method of Moments and Indirect Inference Dr. Andrea Beccarini Willi Mutschler, M.Sc. Institute of Econometrics and Economic Statistics University of Münster willi.mutschler@uni-muenster.de

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

Volatility. Gerald P. Dwyer. February Clemson University

Volatility. Gerald P. Dwyer. February Clemson University Volatility Gerald P. Dwyer Clemson University February 2016 Outline 1 Volatility Characteristics of Time Series Heteroskedasticity Simpler Estimation Strategies Exponentially Weighted Moving Average Use

More information

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia GARCH Models Estimation and Inference Eduardo Rossi University of Pavia Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 7: Cluster Sampling Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of roups and

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within

More information

Econometrics I. Ricardo Mora

Econometrics I. Ricardo Mora Econometrics I Department of Economics Universidad Carlos III de Madrid Master in Industrial Economics and Markets Outline Motivation 1 Motivation 2 3 4 Motivation The Analogy Principle The () is a framework

More information

Instrumental Variables and the Problem of Endogeneity

Instrumental Variables and the Problem of Endogeneity Instrumental Variables and the Problem of Endogeneity September 15, 2015 1 / 38 Exogeneity: Important Assumption of OLS In a standard OLS framework, y = xβ + ɛ (1) and for unbiasedness we need E[x ɛ] =

More information

Increasing the Power of Specification Tests. November 18, 2018

Increasing the Power of Specification Tests. November 18, 2018 Increasing the Power of Specification Tests T W J A. H U A MIT November 18, 2018 A. This paper shows how to increase the power of Hausman s (1978) specification test as well as the difference test in a

More information

Switching Regime Estimation

Switching Regime Estimation Switching Regime Estimation Series de Tiempo BIrkbeck March 2013 Martin Sola (FE) Markov Switching models 01/13 1 / 52 The economy (the time series) often behaves very different in periods such as booms

More information

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II Jeff Wooldridge IRP Lectures, UW Madison, August 2008 5. Estimating Production Functions Using Proxy Variables 6. Pseudo Panels

More information