Empirical Methods in Applied Microeconomics

Size: px
Start display at page:

Download "Empirical Methods in Applied Microeconomics"

Transcription

1 Empirical Methods in Applied Microeconomics Jörn-Ste en Pischke LSE October Nonstandard Standard Error Issues The discussion so far has concentrated on identi cation of the e ect of interest. Obviously, this always should be the main concern: there is little consolence in having an accurate standard error on a meaningless estimate! Hopefully, the previous chapters will help you to design research projects and emprical strategies with lead to valid estimates. But there are a few important inference issues which arise with the type of cross-sectional and panel data we typically use in applied econometrics. It is therefore time to try to tackle those. This chapter uses somewhat more matrix algebra than the previous ones but will hopefully be equally accessible. 1.1 The Bias of Robust Standard Errors The natural way to compute asymptotic standard errors and t-statistics for regression is using the robust covariance matrix ( P [X i X 0 i ]=N) 1 P [Xi X i^" i ]=N ( P [X i X 0 i ]) 1. Of course, asymptotic covariance matrices, as the name suggests, are only valid in large samples. We have seen already that large samples are always a relative concept, and things may go awry in the samples we use in our actual research. The robust covariance matrix is no exception. Suppose the actual covariance matrix of the population regression residuals is given by E["" 0 jx] = = diag( i ). For the moment the covariance matrix is diagonal, meaning that residuals are independent across observations. We will take up the case of dependent residuals in the following section. The covariance matrix of the OLS estimator is then V = X 0 X 1 X 0 X X 0 X 1 : (1) 1

2 With xed Xs this is the actual covariance matrix applicable to our small sample estimator, not just the asymptotic covariance matrix. The problem is that it involves the unknown i s which we replace by the sample counterparts ^" i in our covariance estimator. Notice that ^" = y X b = y X(X 0 X) 1 X 0 y = I X(X 0 X) 1 X 0 (X + ") = M" where M = I X(X 0 X) 1 X 0 is the residual maker matrix and " is the residual of the population regression. Denoting the i-th column of the matrix M by m i then ^" i = m 0 i ". It follows that E ^" i = E m 0 i "" 0 m i = m 0 im i : Notice that m i is the i-th column of the identity matrix (call it e i ) minus the i-th column of the projection matrix H = X(X 0 X) 1 X 0 (which is also called the hat-matrix, since it makes predicted values). Denote the i-th column of the hat-matrix by h i = x 0 i (X0 X) 1 X 0. Hence, m i = e i h i, and therefore E ^" i = (ei h i ) 0 (e i h i ) = i i h ii + h 0 ih i () where h ii is the i-th diagonal element of the hat-matrix.. Because this matrix is symmetric and idempotent (meaning HH 0 = H), it follow that h ii = h 0 i h i so that we obtain (see Chesher and Jewitt, 1987) E bv V = X 0 X 1 X 0 diag h 0 i ( i I) h i X X 0 X 1 : (3) While V b is biased, it is easy to see that it is a consistent estimator of V. Consider the case of xed Xs again, and focus on the middle bit of the matrix X 0 X. b Notice that b is not consistent for, since there are more and more elements to estimate as the sample gets large. Neverthless, ^" i is consistent for " i since ^" i = y i x 0 b i and b is consisent for (another way to think of this: if we have the entire population instead of a sample, we get the population residual from the population regression). But X 0 b X = 1 N X ^" i x 0 ix i and since plim ^" i = i we get plim X 0 b X = X 0 X. i

3 So why is V b is biased? The reason is that E ^" i is a biased estimate, as we have seen in (). Consider the case where the residual is actually homoskedastic so that i =. In this case () gives E ^" i = hii + h ii = (1 h ii ). The variance of the residual in small samples is too small, and this is related to the quantity h ii. So we need to start by considering some properties of the diagonal elements of the hat-matrix, h ii. They are called leverage because they measure how much pull a particular value of x i exerts on the regression line. Note that by i = h 0 iy = h ii y i + X j6=i h ij y j so if h ii is particularly large the i-th observation will have a large in uence (or leverage ) on the predicted value. In a bivariate regression h ii = 1 N + (x i x) P (xj x) so the leverage is related to how far the x-value of a data point is from the center of the data compared to the general dispersion in the sample. High leverage points are outliers in the x-dimension. Figure 1 illustrates how high leverage points lead to small residuals because such points can pull the regression line a lot without changing the residuals on the other (lower leverage) data points a lot. How much leverage is a lot? Notice that P h ii = trace(h) = rank(h) = k, the number of regressors, since H is an idempotent matrix. Hence 1 N X h ii = k N : i Moreover, h ii < 1, and as h ii! 1, the variance of the i-th residual would shrink to zero, i.e. the regression line would pass exactly through that point. Armed with what we now know about h ii, we can now return to the bias formula (3) for the robust covariance estimator. This formula highlights two things. First, the bias depends on the form of, i.e. the actual variances of the population residuals, which is in general unknown. If we knew, we could compute the correct standard errors using (1) directly and there would be no need to resort to the robust covariance matrix. If we do not know, there is no way of knowing the exact extent of the bias of the robust covariance matrix. The second ingredient in the bias are the vectors h i from 3

4 high leverage point x Figure 1: High leverage points lead to small residuals the projection or hat matrix. So the second thing we learn from (3) is that the bias will be worse if there are large x-outliers in our data, and in particular when the leverage of an observation is related to the variance of the residual. What can be done to improve the performance of the robust covariance matrix? There are a number of suggestions in the literature. Denoting the robust covariance matrix estimator by ( P [X i X 0 i ]=N) 1 P [Xi X ii b ]=N ( P [X i X 0 i ]) 1 the alternative forms use alternative values for b i : HC 0 : i b = ^" i HC 1 : i b = N N k^" i HC : i b 1 = ^" i 1 h ii HC 3 : b i = 1 (1 h ii ) ^" i : HC 0 yields the covariance estimator suggested by White (1980). HC 1 is a 4

5 simple degrees of freedom correction, which helps in small samples. HC uses the leverage to give an unbiased estimate of the variance estimate of the i-th residual. HC 3 is an approximation to a jacknife estimator suggested by MacKinnon and White (1985). In many cases the size of the calculated standard errors from HC j will be larger the larger j, but there is no guarantee that the ordering is of that particular form with actual data. These alternative estimators are often implemented in modern regression packages. 1 Even when they are not, they are easy to compute using a trick suggested q by Messer and White (1984). This amounts to dividing y i and X i by b i and then running an IV regression with these transformed variables, q instrumenting X i = qb i by X i b i for the appropriate choice of b i. In order to gain some insight into these various versions of the robust covariance estimator, consider a very simple regression design: y i = + d i + " i (4) where d i is a dummy variable. in this regression estimates the di erence in the means in the two subsamples de ned by the dummy variable. Denoting these subsamples by the subscripts 0 and 1, we have b = y 1 y 0 : Furthermore, let p = E(d i ). We will treat the dummy as a xed covariate, so that p = N 1 =N and 1 p = N 0 =N. We discuss this example because it is an important one in statistics, and we know a lot about the small sample properties for the di erence in means. When y i is distibuted normally with equal but unknown variance in the two subsamples, then the t statisitic for the di erence in means has a t-distribution: this is the classic two sample t-test. However, we are concerned with the possibility that there is heteroskdasticity, meaning that the variances in the two subsamples are di erent. If nothing is known about these two variances the testing problem in small samples becomes intractable: the exact small sample distribution for this problem is not known. This is known as the Behrens-Fisher problem (see e.g. DeGroot, 1986, p ). The di erent robust covariance estimators HC 0 - HC 3 are di erent responses to guring out the standard error for this testing problem. 1 For example, the STATA package computes HC 1, HC, and HC 3. Another suggestion to improve the small sample performance of the covariance estimator is bootstrapping. Horowitz (1997) advocates a form of the bootstrap called the wild bootstrap in this context. 5

6 De ne Sj = P d i =j y i y j for j = 0; 1. The diagonal elements of the hat-matrix in this particular case are 1=N0 if d h ii = i = 0 1=N 1 if d i = 1 ; and it is straightforward to show that the ve covariance estimators are N S OLS : 0 + S 1 1 S = 0 + S 1 N 0 N 1 N Np(1 p) N HC 0 : HC 1 : S0 N0 + S 1 N1 N N S 0 N 0 + S 1 N1 HC : HC 3 : S 0 N 0 (N 0 1) + S 1 N 1 (N 1 1) S 0 (N 0 1) + S 1 (N 1 1) : The standard OLS estimator pools the observations from both subsamples to derive the variance estimate: this is the e cient thing to do when the two variances are actually the same. The White (1980) estimator HC 0 adds the estimates of the two sampling variances of the means, using the consistent (but biased) maximum likelihood estiamte of variance. The HC estimator is the unbiased estimator for the sampling variance in this case, since it makes the correct degrees of freedom correction. HC 1 makes the degrees of freedom correction outside the sum, which will help but generally not be quite correct. Since we know HC to be the unbiased estimate of the sampling variance, we also see immediately that HC 3 will be too big. Even though we know the exact unbiased estimator for the sampling variance in this case, we still don t know the small sample distribution of the test statistic y 1 y q 0 ; S 0 N 0 (N 0 1) + S1 N 1 (N 1 1) the Behrens-Fisher problem. Note that p = 0:5 implies that the regression design is perfectly balanced. In this case, the OLS estimator will be equal to HC 1, and all ve estimators will generally di er little. To provide some further insights, we present some results from a small Monte Carlo experiment for the model (4). Whe choose N = 30, since 6

7 this will highlight the small sample issues, and p = 0:9, which implies h ii = 10=N = 1=3 if d i = 1, in order to have a relatively unbalanced design. We draw N(0; " i ) if d i = 0 N(0; 1) if d i = 1 and we show results for two cases. The rst has relatively little heteroskedasticity, and we set = 0:85, while the second has lots of heteroskedasticity with = 0:5. Table displays the results. The columns mean and standard deviation display means and standard deviations of the various estimators across 5,000 replications of the sampling experiment. The standard deviation of b is the sampling variance we trying to measure. Even with little heteroskedasticity the OLS standard erros are too small by about 15%. However, HC 0 and HC 1 are even smaller because of the small sample bias. HC is slightly bigger than the OLS standard errors on average. Notice that this estimator of the sampling variance is unbiased while the mean of the HC standard errors across sampling experiments (0.54) is still below the standard deviation of b (0.60). This comes from the fact that the standard error is the square root of the sampling variance, the sampling variance is itself estimated and hence has sampling variability, and the square root is a concave function. The HC 3 standard error is slightly too big, as we expected. The last two columns in the table show empirical rejection rates for the hypothesis b =, using a nominal size of 5% for the test. Since we don t know the exact small sample distribution, we compare the test statistics to the normal distribution (which is the asymptotic distribution) and to a t-statistic (which is not the correct small sample distribution in this case for any of the estimators, as we have seen). Rejection rates are far too high for all tests. Interestingly, with little heteroskedasticity OLS standard errors have lower rejection rates than the robust standard errors, even though the standard errors themselves are are smaller than HC and HC 3 on average. But the standard errors themselves are estimated and have sampling variability. The OLS standard errors are much more precisely estimated than the robust standard errors, as can be seen from column (). 3 This means the robust standard errors will sometimes be too small by accident and this happens often enough in this case to make the OLS 3 The large sampling variance of the robust estimators has also been noted by Chesher and Austin (1991). Kauermann and Carroll (001) propose an adjustment to the con - dence interval to correct for this. 7

8 standard errors preferred. The lesson we can take a away from this is that robust standard errors are no panacea. They can be smaller than OLS standard errors for two reasons: the small sample bias we have discussed, and the higher sampling variance of these standard errors. Hence, if we observe robust standard errors being smaller than OLS standard errors we know this to be a warning ag. If heteroskedasticity was present and our standard error estimate was about right, this wouldn t happen. With lots of heteroskedasticity as in the lower panel of the table things are di erent: OLS standard errors are now dominated by the robust standard errors throughout, although the empirical rejection rates are still way too high to give us much con dence in our con dence intervals for any of the estimators. Using the t-distribution rather than the normal helps only marginally. There doesn t seem to be any clear way out of this conundrum. Standard error estimates might be biased in nite samples. OLS standard errors because of heteroskedasticity, and robust standard errors because of the in uence of high leverage points. Hence, if the regression design is possibly unbalanced, the only prescription for the applied researcher is to check the data for high leverage points. If the regression design is relatively balanced then robust standard errors should produce fairly accurate con dence intervals even in small samples. One hopeful observation on robust standard errors is that we have rarely seen them di er from OLS standard errors in emprical practice by more than something like 5%. In any applied project there are always myriads of speci cation choices to be made from selection of the sample, to the exact treatment of the variables, regression design, etc. These certainly produce non sampling variation in our estimates of a similar magnitude (in the sense that our estimates would di er if we repeated the project with slightly di erent choices). Hence, although we strife to get our standard errors as right as possible, if they end up being biased by something in the order of 5% this would probably not keep us up at night. But this is only true in the case of independent observations. Things can be much worse when observations are dependent. 1. Clustering and Serial Correlation in Panels 1..1 Clustering and the Moulton Factor The more serious problems have to do with correlation of the residuals across the units of observation. Start by considering the simple model y ig = + x g + " ig (5) 8

9 where the outcome is observed at the individual level but the regressor of interest, x g, varies only at a higher level of aggregation, a group g, and there are G groups. For example, y ig could be the test score of a student, and x g is class size, where i denotes the student, and g denotes the class room. If x g is as randomly assigned, as in the Tennessee STAR experiment (Krueger, 1999), then the OLS estimator is unbiased and consistent for the population regression coe cient. Recall that the 1,000 students in Kindergarten to garde 3 were randomly assigned to small or regular classes in the STAR experiment. What we are worried about in an analysis of the STAR data is that the error term has a group structure: " ig = v g + ig : (6) The class room level component could result from the fact that a class may have had a particularly good teacher, or a class took the test when there were a lot of external disruptions, so that all students performed more poorly than alternative classes. This problem of correlation in the errors is, of course, well known in econometrics. Kloek (1981) and in particular Moulton (1986), however, pointed out how important it can be for applied research in the grouped regressor case. Following their derivations, it is straightforward to analyze this case. The algebra needs some extra notation, and is therefore exiled to an appendix. Let v = v + : Given the structure (6) is called the intra-class correlation (even in cases where the groups are not class-rooms!). When the groups are of equal size n, we have var( ) b var ( ) b = 1 + (n 1) (7) where var( b OLS ) is the true variance of the OLS estimator and var ( b OLS ) is the conventional OLS variance. Notice that the OLS standard error formula will be worse if n is large and if is large. To see the intuition, consider the case where! 1. In this case, all the errors within a group are the same. This is just like taking a data set and making n identical copies. The covariance matrix of the replicated data set is going to be 1=n times the original covariance matrix, although no information has been added. In order to see how this problem is related to the group structure in the regressor x, consider the generalization of (7) where the regressor is x ig, 9

10 which varies at the individual level, but is correlated within groups, and the group sizes n g vary by group. In this case var( b ) var ( b ) x = var(ng ) = 1 + n P P g + n 1 x (8) i6=k (x ig x) (x kg x) var(x ig ) P g n : g(n g 1) x is the intra-class correlation of x ig, and it is actually unrestricted and does not impose a form like in (6). What the formula says is that the bias in the OLS formula is much worse when x is large but vanishes when x = 0: If the x ig s are uncorrelated within groups, the error structure does not matter for the estimation of the standard errors. In order to see that this problem can be quite important, return to example of estimating the e ect of class size on student achievement with the Tennessee STAR data. For illustration, we will just run (5) by OLS, although a fair bit of the variation in class size comes from non-random factors. A simple regression of the percentile score for Kindergarteners on their class size yields an estimate of with a robust standard error of Now consider the formula (8). Even though x = 1, classes are of unequal size. Plugging all the relevant values into the formula we get var( b ) var ( b ) = :13 19:4 + 19:4 1 0:311 = 7:01: This implies that our standard error estimate is too large by a factor of :65 = p 7:01. The corrected standard error is The same problem arises in IV estimation. Consider the regression equation y ig = + x ig + " ig where the regressor can now vary at the individual level. Let Z be a matrix of instruments which only vary at the group level. It is easy to show that the Moulton formula for the IV case is the same as (8) for the grouping of the instrument (Shore-Shepard, 1996). Hence it is equally important to address this problem at the level of an instrumental variable as it is for a regressor in the OLS case. Another setting where this problem might pop up is the regression discontinuity design if the confounder, x i, is measured at a group level, and not the individual level (see Card and Lee, 007). 4 The IV coe cient estimate, where class size is instrumented with two dummies for the assignment to regular and regular with aide groups, is almost identical. 10

11 There are various solutions to this problem: 1. Parametric correction: Obtain an estimate of and calculate the standard errors using the correct formula given by (8). The intra-class correlations and x can typically be estimated easily in statistical software. 5. Clustered standard errors: A non-parametric correction for the standard errors is given by the following extension of the robust covariance matrix (Liang and Zeger, 1986): var( ) b = X 0 X! X 1 X gg b X g X 0 X 1 (9) g b" 3 1g b" 1g b" g b" 1g b" ngg b g = qb" g b" 0 g = b" 1g b" g b" g b"(ng 1)gb" ngg 5 : b" 1g b" ngg b" (ng 1)gb" ngg b" n gg where X g is the matrix of regressors for group g, and q is a degrees of freedom adjustment factor like G=(G 1) similar to the one in HC 1 for the simple heteroskedasticity robust covariance matrix above. This calculation of the covariance matrix allows for arbitrary correlation of the errors within the clusters g, not just the structure in (6). Clustered standard errors will be consistent as the number of groups gets large. 3. Aggregation to the group level: Calculate y g rst and then run a weigthed least squares regression y g = + x g + " g with the number of observations in the group as weights (or the inverse of the sampling variance of y g ). For the correct choice of weights this is equivalent to doing OLS on the micro data. The error term at this aggregated level is " g = v g + g, and the error component v g is therefore considered in the usual second step standard errors so that inference can be based directly on the second step covariance matrix. 6 5 For example, using the loneway command in Stata. 6 See Wooldridge (003) and Donald and Lang (007). While the aggregate regression is simply the between regression in the context of a random e ects model, long known to econometricians, the rst discussion of the analogy of the micro and group level regression and the relationship to inference is probably in Kloek (1981). 11

12 If there are other micro level regressors in the model, as in y ig = g + x g + W ig + " ig ; we can do the aggregation by running the regression y ig = 0 g + W ig + " 0 ig which includes a full set of group dummies. The b 0 g coe cients on the group dummies are our group means, purged of the e ect of the individual level variables W ig. Obviously, aggregation does not work when x ig varies within group. Averaging the x ig s to group means is IV, and hence involves changing the estimator. 4. Block bootstrap: Bootstrapping means to draw random samples from the empirical distribution of the data. Since the best representation of the empirical distribution of the data is the data itself, this means in practice for a sample of size N, to draw another sample of size N with replacement from the original data set. This can be done many times, and an estimate is computed for all the bootstrap samples. The standard error of the estimate is the standard deviation of the estimates across all the bootstrap samples. In block bootstrapping, the bootstrap draws will be whole blocks of data as de ned by the groups g. Hence, any correlation across the errors within the block will be kept intact with the block bootstrap sampling, and should therefore be re ected in the standard error estimate. There are many di erent ways to do bootstrap inference, for more on this see Cameron, Gelbach, and Miller (006). 5. Estimate a random e ects GLS or ML model of equation (5). This relies on the on the linearity of the CEF, and we prefer the simple OLS approximation to the conditional expectation function, so we do not recommend this approach. Table 8..1 returns to the class size example from the STAR experiment, which we have discussed in this section. The table presents six di erent estimates of the standard errors: conventional robust standard errors (using HC 1 ), two versions of the parametrically corrected standard errors using the Moulton formula (8), the rst using the formula for the intra class correlation given by Moulton, and the second using an alternative ANOVA estimator of the intra class correlation, 7 clustered standard errors, block bootstrapped 7 This is computed using the loneway command in STATA. 1

13 standard errors, and estimates aggregated to the group level. Columsns (1) and () present the results on the class size regressor while columns (3) and (4) of show the estimates on an individual level covariate we have included in the regression: sex. Class rooms are almost balanced by sex, and hence there is (almost) no intra class correlation in this regressor. As a result, the standard error estimates for this regressor are not a ected by any of the corrections. As we have seen before, the adjustment to the standard error on the class size regressor are large but all the di erent adjustments deliver standard errors that are also almost identical. There are 318 class rooms in the data set, which is a large number, and all these methods should deliver similar results with a large number of clusters. Hence we tend to use clustered standard errors in practice, because they are conveniently available in many regression packages, and hence easy to compute. 8 The aggregation approach also has much to commend itself, if only because it often allows to plot the data easily in the second stage. With a small numbers of groups there is a new set of concerns to worry about, and we turn to this in section 1..3 below. 1.. Serial Correlation in Panels and Di erence-in-di erence Models Now suppose that there are only two groups, i.e. the regressor of interest is a dummy variable: y ig = + d g + v g + ig : (10) The Moulton problem does not arise in this case, because OLS ts the regression line perfectly through the two points de ned by the dummy variable. To see this notice that so that E (y ig jd g ) = + d g + E (v g jd g ) b = E (y ig jd g = 1) E (y ig jd g = 0) = + E (v g jd g = 1) E (v g jd g = 0) : Since E (v g jd g ) = 0 this means that the estimate of will be unbiased but it will not be consistent, as pointed out by Donald and Lang (007). In 8 In fact, the name clustered standard errors, which applied researchers have adopted, derives from the name of the Stata option. 13

14 every new sample, there will be a new draw of v g. So the regression line will be somewhat o, and the estimate will not exactly equal the population. However, on average, there will be no bias: sometimes will be overestimated, sometimes underestimated. Now suppose we let N! 1, while G, the number of groups, remains constant at. The bias that exists in any particular sample will not go to zero, because v g is just as imporant in the big sample as in the small sample. Only the sampling variation due to it will vanish, not the sampling variation due to v g. In a sense, the Moulton problem discussed above arises precisely from the fact that the regression line will not neatly t through all the points de ned by v g when there are three or more groups. This problem also arises in the standard x di erence-in-di erence model. Recall from section?? that we modeled the outcome as an additive function of state e ect, a time e ect, and the treatment e ect on the interaction E[y i js; t] = s + t +d st. Now consider the case where there is a state-time speci c component to the error term: y ist = s + t + d st + v st + ist : (11) Because the model is saturated this is no di erent from the model (10) for the purpose of inference. As before, the error component v st does not vanish even when N! 1, i.e. the group sizes are large. Moreover, there is really no way to get consistent standard errors which acknowledge this problem because d st and v st are completely collinear. So no separate estimate of and v st is possible. This means that x di erence-in-di erences are not really very informative if v st shocks are important. An example of this problem is the basic analysis of the employment e ects of the New Jersey minimum wage in the original Card and Krueger (1994) New Jersey-Pennsylvania comparison. Card and Krueger compared employment at New Jersey and Pennsylvania fast food restaurants before and after New Jersey introduced a state minimum wage. With two states and two periods this is the standard x DD design. The solution to this problem is to have either multiple time periods on two states, as in the Card and Krueger (000) reanalysis of the New Jersey-Pennsylvania experiment with a longer time series of payroll data, or multiple contrasts for two time periods, like in Card (199) using 51 states. It is straightforward to get correct standard errors if v st is iid by using one of the methods discussed in the previous section. In many applications of the di erence-in-di erence model there will be both multiple treatment groups (s) and multiple time periods (t). Bertrand, 14

15 Du o, and Mullainathan (004) and Kézdi (004) point out a further problem in this case. Many economic variables of interest tend to be correlated over time. This means that v st is most likely serially correlated. Consider the Card and Krueger example again and imagine using the data from Card and Krueger (000) which span the period from October 1991 to September This yiels 7 monthly observations for each state. But these 7 observations are not independent: employment variations tend to be highly correlated over time. For example, we saw in the DD notes before that employment in Pennsylvania was consistently lower than in New Jersey for most of 1994 and Hence, the solutions which treat v st as iid are not su cient. Bertrand et al. (004) investigate a variety of remedies, like clustering at the state level, block bootstrap methods at the state level, ignoring the time series information by aggregating the data into two periods, or parametric modeling of the serial correlation. An interesting and important result is that clustering standard errors at the state level solves the serial correlation problem. In the previous section we would have treated the state*month cell as the cluster because the variation in the key regressor is at the state*month level. Instead, treat the entire state as the cluster. This might seem odd at rst glance, since we have already controlled for state e ects. The state dummy s in (11) already removes the time mean of v st which is v s. Nevertheless, this method solves the serial correlation problem because v st v s will still be correlated for adjacent periods. Clustering at the state*month level does not address this because residuals across clusters are treated as independent. But clustering at the state level allows for this since this covariance estimator allows a completely non-parametric residual correlation within clusters. Clustered standard errors serve a very di erent role here than in the standard Moulton case (5) but they work. The conclusion is that correlated errors are likely to be a problem in many panel type applications and adjusting the standard errors for this correlation is important. Donald and Lang (004), Bertrand et al. (004), and Kézdi (004) highlight the issue that we may want to treat standard errors as clustered at a high level of aggregation. As a result we may end up with relatively few clusters Few clusters The problem of few clusters is the analogue to the small sample problem of robust standard errors discussed in section 1.1. Here as there, small sample distributions for the di erent estimators are not available but we know (from 15

16 Monte Carlo evidence) that all the adjustments can be biased substantially when there are only few clusters. Donald and Lang (007) and Cameron, Gelbach, and Miller (006) discuss inference when the resulting number of groups is small, see also Hansen (007b). This area is very much research in progress, and rm recommendations are therefore di cult. The main approaches are 1. Bias corrections of clustered standard errors. Clustered standard errors are biased in small samples because E b" g b" 0 g 6= E "g " 0 g = g just as in Section 1.1. One solution to the bias problem is to use an adjustment just as in Section 1.1 to correct for the small sample bias. As before, the bias depends on the form of g. Bell and McCa rey (00) suggest to adjust the residuals by where A solves and b g = qe" g e" 0 g e" g = Ab" g A 0 ga g = (I H g ) 1 H g = X g (X 0 X) 1 X 0 g is the hat-matrix at the group level. This is the analog to HC for the clustered case. However, the matrix A g is not unique; there are many such decompositions. Bell and McCa rey (00) suggest to use the symemtric square root of (I H g ) 1 or A g = P 1= where P is the matrix of eigenvectors of (I H g ) 1, is the diagonal matrix of the correponding eigenvalues, and 1= is the diagonal matrix of the square roots of the eigenvalues. One problem with the Bell and McCa rey adjustment is that (I H g ) may not be of full rank, and hence the inverse may not exist for all designs. This happens, for example, when one of the regressors is a dummy variable and this dummy variable only takes on values of zero or one within the group. In addition, the dimenstion of H g is the number of observations per group. Since this matrix needs to be inverted this only tends to work if the group sizes are reasonably small. 16

17 . Various authors, including Bell and McCa rey (00) and Donald and Lang (007) suggest to base inference on a t-distribution with G k degrees of freedom, where k is the number of regressors, rather than on the standard normal distribution. We have seen in Section 1.1 that this is not generally the correct small sample distribution even if the errors v g are normally distributed. Neverthless, for small G this makes a substantial di erence. Cameron, Gelbach, and Miller (006) nd that this works well in conjunction with the Bell and McCa rey (00) bias correction as described in 1 for the Moulton problem. 3. Donald and Lang (007) suggest that aggregating to the group level works well even with a small number of groups in conjunction with using a t-distribution with G k degrees of freedom. Straight aggregation does not work to solve the serial correlation problem in panels discussed in Bertrand et al. (004) and Kezdi (004). 4. Cameron, Gelbach, and Miller (006) report that various forms of the bootstrap work well with small numbers of groups, and typically outperform clustered standard errors without the bias correction. They point out, however, that the bootstrap does not always lead to improved small sample statistics. In order to get such an improvement they suggest to bootstrap Wald statistics directly, rather obtain the test statistic based on bootstrapped standard errors. They also recommend a method called the wild bootstrap. Rather than resampling entire groups (y g ; X g ) of data, this involves computing a new yg based on the residual b" g = y g X g, b where y g = X g b + b" g. This implies that the X g s are being kept xed and only a new residual b" g is chosen in each bootstrap replication. In the wild bootstrap b" g = b" g with probability 0.5 and b" g = b" g with probability Hansen (007a) proposes parametric methods to solve the serial correlation problem discussed above. I.e. model the error process as an AR process, estimate the AR parameters and x the covariance matrix. Hansen points out that the AR parameters are biased in panels of short duration, and demonstrates the importance to use a bias adjusted estimator for these coe cients. His methods seem to be yield much improved inference compared to Bertrand et al. s (004) investigation of parametric models without bias adjustment. Although Hansen (007a) does not explicitly demonstrate the performance of his estimator with a small number of groups one would generally expect 17

18 more parametric methods to be less sensitive to sample size as the nonparametric ones, like clustering, as long as the parametric assumptions are roughly right. Various authors have demonstrated that ignoring the problem of a small number of clusters can lead to very misleading inference. Nevertheless, there seems to be no single x which solves this problem satisfactorily. We have seen above, in the case of the simple robust covariance matrix, section 1.1, that xing the bias problem tends to introduce variance into the covariance estimator. Hence, trying to x the bias may sometimes lead to smaller standard errors than ignoring the problem. Whether the few clusters problem leads to a lot of bias also seems to depend on the situation. For example, Monte Carlo results in Hansen (007b) suggest that the bias is a lot worse for the standard Moulton problem than for the serial correlation problem. This suggests that it may be feasible to stick with regular clustered standard errors to solve serial correlation in panels, even when the number of clusters is as small as 10. For solving the Moulton problem in section 1..1, it seems more imporant to worry about clustered standard errors with a small number of clusters. However, Donald and Lang (007) aggregation seems to work well in this case as long as the regressor of interest is xed within groups. This would also be our preferred strategy when both problems occur in combination, i.e. when the estimation is based on micro data, but treatment only varies at the state (or some other aggregate) level over time. In this case, aggregate the observations rst to the state-year level, and then cluster standard errors in the aggregate panel at the state level. The uphsot from all this is that it may be important to pay attention to small sample issues in applied microeconometric work. Working with large micro data sets, we used to sneer at the macro economists with their small time series samples. But he who laughs last laughs best: it turns out that it is the macro economists who had it right all along, and we micro economists are now often con ned to the same small sample sizes as they are. The key is to think about where your variation lives. Unfortunately, all too often it lives at a fairly aggregate level. Which methods work best in particular applications when the original data result from large micro data samples is still an open question, and this remains an area of active research. References Bell and Maca rey (00) Bias reduction in standard errors for linear regression with multi-stage samples, Survey Methodology, 00 18

19 Card, David, and David Lee (007) Regression Discontinuity Inference with Speci cation Error, Journal of Econometrics, 007. Chesher, Andrew and Ian Jewitt (1987) The bias of the heteroskedasticity consistent covariance estimator, Econometrica 55, Chesher, Andrew and Gerald Austin (1991) The nite-sample distributions of heteroskedasticity robust Wald statistics, Journal of Econometrics 47, DeGroot, Morris (1986) Probability and Statistics, nd edition, Reading: Addison Wesley. Hansen Christian (007a) Generalized Least Squares Inference in Multilevel Models with Serial Correlation and Fixed E ects, Journal of Econometrics. Hansen, Christian (007b) Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data when T is Large, Journal of Econometrics. Kauermann, Göran and Raymond J. Carroll (001) A note on the E - ciency of Sandwich Covariance Estimation, JASA, 96, Brent Moulton, Random Group E ects and the Precision of Regression Estimates, Journal of Econometrics, 3, pp L. Shore-Sheppard, The Precision of Instrumental Variables Estimates with Grouped Data, Industrial Relations Section Working Paper #374, Princeton University, 1996 Marianne Bertrand, Esther Du o, and Sendhil Mullainathan, How Much Should We Trust Di erences-in-di erences Estimates? Quarterly Journal of Economics, vol. 119, February 004, pp K. Liang and Scott L. Zeger, Longitudinal Data Analysis Using Generalized Linear Models, Biometrika 73 (1986), 13-. Colin Cameron, Jonah Gelbach, and Douglas L. Miller Bootstrap-Based Improvements for Inference with Clustered Errors, mimeographed, 006 T. Kloek (1981) OLS Estimation in a Model Where a Microvariable is Explained by Aggregates and Contemporaneous Disturbances are Equicorrelated. Econometrica, Vol. 49, No. 1. (Jan., 1981), pp Davidson and MacKinnon (1993) Estimation and Inference in Econometrics, New York and Oxford: Oxford University Press. Stephen G. Donald, Kevin Lang (007) Inference with Di erence-in- Di erences and Other Panel Data. Review of Economics and Statistics May 007, Vol. 89, No. : Joel L. Horowitz (1997) Bootstrap Methods in Econometrics: Theory and Numerical Performance, in: Kreps and Wallis (eds.) Advances in Economics and Econometrics: Theory and Applications, Seventh World Congress, vol III, Cambridge: Cambridge University Press,

20 MacKinnon and White (1985) Some heteroskedasticity consistent covariance matrix estimators with improved nite sample properties. Journal of Econometrics 9, Messer and White (1984) A note on computing the heteroskedasticity consistent covariance matrix using instrumental variables techniques. Oxford Bulletin of Economics and Statistics, 46, Kezdi (004) Robust Standard Error Estimation in Fixed-E ects Panel Models, Hungarian Statistical Review, Special English Volume #9, 004. pp Wooldridge (003) Cluster-sample methods in applied econometrics, American Economic Review. May 003. Vol. 93, Iss. ; p Appendix In order to derive (7), write y g = 6 4 y 1g y g " g = 6 4 " 1g " g y ngg " ngg and y = 6 4 y 1 y. y G x = x 1 x. G x G " = 6 4 " 1 ". " G

21 where g is a column vector of n g ones and G is the number of groups. Notice that E("" 0 ) = = G 3 1 g = 1. " = " (1 )I + g 0 g 1 Now = v v + : X 0 X = X g X 0 X = X g n g x g x 0 g x g 0 g g g x 0 g: But Denote g = 1 + (n g x g 0 g g g x 0 g = "x g 0 g (n g 1) 1 + (n g 1) 1 + (n g 1) 3 = "n g [1 + (n g 1)] x g x 0 g: 1), so we get x g 0 g g g x 0 g = "n g g x g x 0 g X X 0 X = " n g g x g x 0 g: g 7 5 x0 g With this at hand, we can compute the covariance matrix of the OLS estimator, which is var( b OLS ) = X 0 X 1 X 0 X X 0 X 1! 1 X X = " n g x g x 0 g n g g x g x 0 g g 1 g 1 X n g x g xg! 0 : g

22 We want to compare this with the standard OLS covariance estimator var ( b OLS ) = " 1 X n g x g xg! 0 : g If the group sizes are equal, n g = n and g = = 1 + (n 1) so that var( b OLS ) = " = "! 1 X X nx g x 0 g nx g x 0 g g X nx g x 0 g = var ( b OLS ); g g! 1 X nx g x 0 g g! 1 which implies (7).

Bootstrap-Based Improvements for Inference with Clustered Errors

Bootstrap-Based Improvements for Inference with Clustered Errors Bootstrap-Based Improvements for Inference with Clustered Errors Colin Cameron, Jonah Gelbach, Doug Miller U.C. - Davis, U. Maryland, U.C. - Davis May, 2008 May, 2008 1 / 41 1. Introduction OLS regression

More information

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner Econometrics II Nonstandard Standard Error Issues: A Guide for the Practitioner Måns Söderbom 10 May 2011 Department of Economics, University of Gothenburg. Email: mans.soderbom@economics.gu.se. Web: www.economics.gu.se/soderbom,

More information

Wild Bootstrap Inference for Wildly Dierent Cluster Sizes

Wild Bootstrap Inference for Wildly Dierent Cluster Sizes Wild Bootstrap Inference for Wildly Dierent Cluster Sizes Matthew D. Webb October 9, 2013 Introduction This paper is joint with: Contributions: James G. MacKinnon Department of Economics Queen's University

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Lecture 4: Linear panel models

Lecture 4: Linear panel models Lecture 4: Linear panel models Luc Behaghel PSE February 2009 Luc Behaghel (PSE) Lecture 4 February 2009 1 / 47 Introduction Panel = repeated observations of the same individuals (e.g., rms, workers, countries)

More information

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors by Bruce E. Hansen Department of Economics University of Wisconsin October 2018 Bruce Hansen (University of Wisconsin) Exact

More information

Casuality and Programme Evaluation

Casuality and Programme Evaluation Casuality and Programme Evaluation Lecture V: Difference-in-Differences II Dr Martin Karlsson University of Duisburg-Essen Summer Semester 2017 M Karlsson (University of Duisburg-Essen) Casuality and Programme

More information

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors by Bruce E. Hansen Department of Economics University of Wisconsin June 2017 Bruce Hansen (University of Wisconsin) Exact

More information

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation Inference about Clustering and Parametric Assumptions in Covariance Matrix Estimation Mikko Packalen y Tony Wirjanto z 26 November 2010 Abstract Selecting an estimator for the variance covariance matrix

More information

Lecture Notes on Measurement Error

Lecture Notes on Measurement Error Steve Pischke Spring 2000 Lecture Notes on Measurement Error These notes summarize a variety of simple results on measurement error which I nd useful. They also provide some references where more complete

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails GMM-based inference in the AR() panel data model for parameter values where local identi cation fails Edith Madsen entre for Applied Microeconometrics (AM) Department of Economics, University of openhagen,

More information

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 7: Cluster Sampling Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of roups and

More information

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

1 The Multiple Regression Model: Freeing Up the Classical Assumptions 1 The Multiple Regression Model: Freeing Up the Classical Assumptions Some or all of classical assumptions were crucial for many of the derivations of the previous chapters. Derivation of the OLS estimator

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1 PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,

More information

Heteroskedasticity-Robust Inference in Finite Samples

Heteroskedasticity-Robust Inference in Finite Samples Heteroskedasticity-Robust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticity-robust standard

More information

the error term could vary over the observations, in ways that are related

the error term could vary over the observations, in ways that are related Heteroskedasticity We now consider the implications of relaxing the assumption that the conditional variance Var(u i x i ) = σ 2 is common to all observations i = 1,..., n In many applications, we may

More information

NBER WORKING PAPER SERIES ROBUST STANDARD ERRORS IN SMALL SAMPLES: SOME PRACTICAL ADVICE. Guido W. Imbens Michal Kolesar

NBER WORKING PAPER SERIES ROBUST STANDARD ERRORS IN SMALL SAMPLES: SOME PRACTICAL ADVICE. Guido W. Imbens Michal Kolesar NBER WORKING PAPER SERIES ROBUST STANDARD ERRORS IN SMALL SAMPLES: SOME PRACTICAL ADVICE Guido W. Imbens Michal Kolesar Working Paper 18478 http://www.nber.org/papers/w18478 NATIONAL BUREAU OF ECONOMIC

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Notes on Generalized Method of Moments Estimation

Notes on Generalized Method of Moments Estimation Notes on Generalized Method of Moments Estimation c Bronwyn H. Hall March 1996 (revised February 1999) 1. Introduction These notes are a non-technical introduction to the method of estimation popularized

More information

Inference in difference-in-differences approaches

Inference in difference-in-differences approaches Inference in difference-in-differences approaches Mike Brewer (University of Essex & IFS) and Robert Joyce (IFS) PEPA is based at the IFS and CEMMAP Introduction Often want to estimate effects of policy/programme

More information

INFERENCE WITH LARGE CLUSTERED DATASETS*

INFERENCE WITH LARGE CLUSTERED DATASETS* L Actualité économique, Revue d analyse économique, vol. 92, n o 4, décembre 2016 INFERENCE WITH LARGE CLUSTERED DATASETS* James G. MACKINNON Queen s University jgm@econ.queensu.ca Abstract Inference using

More information

8. Nonstandard standard error issues 8.1. The bias of robust standard errors

8. Nonstandard standard error issues 8.1. The bias of robust standard errors 8.1. The bias of robust standard errors Bias Robust standard errors are now easily obtained using e.g. Stata option robust Robust standard errors are preferable to normal standard errors when residuals

More information

WILD CLUSTER BOOTSTRAP CONFIDENCE INTERVALS*

WILD CLUSTER BOOTSTRAP CONFIDENCE INTERVALS* L Actualité économique, Revue d analyse économique, vol 91, n os 1-2, mars-juin 2015 WILD CLUSTER BOOTSTRAP CONFIDENCE INTERVALS* James G MacKinnon Department of Economics Queen s University Abstract Confidence

More information

Robust Inference with Clustered Data

Robust Inference with Clustered Data Robust Inference with Clustered Data A. Colin Cameron and Douglas L. Miller Department of Economics, University of California - Davis. This version: Feb 10, 2010 Abstract In this paper we survey methods

More information

Robust Inference with Multi-way Clustering

Robust Inference with Multi-way Clustering Robust Inference with Multi-way Clustering Colin Cameron, Jonah Gelbach, Doug Miller U.C. - Davis, U. Arizona, U.C. - Davis February 2010 February 2010 1 / 44 1.1 Introduction Moulton (1986, 1990) and

More information

1 A Non-technical Introduction to Regression

1 A Non-technical Introduction to Regression 1 A Non-technical Introduction to Regression Chapters 1 and Chapter 2 of the textbook are reviews of material you should know from your previous study (e.g. in your second year course). They cover, in

More information

1 Correlation between an independent variable and the error

1 Correlation between an independent variable and the error Chapter 7 outline, Econometrics Instrumental variables and model estimation 1 Correlation between an independent variable and the error Recall that one of the assumptions that we make when proving the

More information

Clustering as a Design Problem

Clustering as a Design Problem Clustering as a Design Problem Alberto Abadie, Susan Athey, Guido Imbens, & Jeffrey Wooldridge Harvard-MIT Econometrics Seminar Cambridge, February 4, 2016 Adjusting standard errors for clustering is common

More information

PSC 504: Differences-in-differeces estimators

PSC 504: Differences-in-differeces estimators PSC 504: Differences-in-differeces estimators Matthew Blackwell 3/22/2013 Basic differences-in-differences model Setup e basic idea behind a differences-in-differences model (shorthand: diff-in-diff, DID,

More information

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley Models, Testing, and Correction of Heteroskedasticity James L. Powell Department of Economics University of California, Berkeley Aitken s GLS and Weighted LS The Generalized Classical Regression Model

More information

Estimating the Number of Common Factors in Serially Dependent Approximate Factor Models

Estimating the Number of Common Factors in Serially Dependent Approximate Factor Models Estimating the Number of Common Factors in Serially Dependent Approximate Factor Models Ryan Greenaway-McGrevy y Bureau of Economic Analysis Chirok Han Korea University February 7, 202 Donggyu Sul University

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

We begin by thinking about population relationships.

We begin by thinking about population relationships. Conditional Expectation Function (CEF) We begin by thinking about population relationships. CEF Decomposition Theorem: Given some outcome Y i and some covariates X i there is always a decomposition where

More information

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1 Markov-Switching Models with Endogenous Explanatory Variables by Chang-Jin Kim 1 Dept. of Economics, Korea University and Dept. of Economics, University of Washington First draft: August, 2002 This version:

More information

LECTURE 13: TIME SERIES I

LECTURE 13: TIME SERIES I 1 LECTURE 13: TIME SERIES I AUTOCORRELATION: Consider y = X + u where y is T 1, X is T K, is K 1 and u is T 1. We are using T and not N for sample size to emphasize that this is a time series. The natural

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:

More information

Empirical Methods in Applied Microeconomics

Empirical Methods in Applied Microeconomics Empirical Methods in Applied Microeconomics Jörn-Ste en Pischke LSE November 2007 1 Nonlinearity and Heterogeneity We have so far concentrated on the estimation of treatment e ects when the treatment e

More information

Microeconometrics: Clustering. Ethan Kaplan

Microeconometrics: Clustering. Ethan Kaplan Microeconometrics: Clustering Ethan Kaplan Gauss Markov ssumptions OLS is minimum variance unbiased (MVUE) if Linear Model: Y i = X i + i E ( i jx i ) = V ( i jx i ) = 2 < cov i ; j = Normally distributed

More information

A better way to bootstrap pairs

A better way to bootstrap pairs A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,

More information

GMM estimation of spatial panels

GMM estimation of spatial panels MRA Munich ersonal ReEc Archive GMM estimation of spatial panels Francesco Moscone and Elisa Tosetti Brunel University 7. April 009 Online at http://mpra.ub.uni-muenchen.de/637/ MRA aper No. 637, posted

More information

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Tak Wai Chau February 20, 2014 Abstract This paper investigates the nite sample performance of a minimum distance estimator

More information

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?

More information

Environmental Econometrics

Environmental Econometrics Environmental Econometrics Syngjoo Choi Fall 2008 Environmental Econometrics (GR03) Fall 2008 1 / 37 Syllabus I This is an introductory econometrics course which assumes no prior knowledge on econometrics;

More information

E cient Estimation and Inference for Di erence-in-di erence Regressions with Persistent Errors

E cient Estimation and Inference for Di erence-in-di erence Regressions with Persistent Errors E cient Estimation and Inference for Di erence-in-di erence Regressions with Persistent Errors Ryan Greenaway-McGrevy Bureau of Economic Analysis Chirok Han Korea University January 2014 Donggyu Sul Univeristy

More information

Simple Estimators for Semiparametric Multinomial Choice Models

Simple Estimators for Semiparametric Multinomial Choice Models Simple Estimators for Semiparametric Multinomial Choice Models James L. Powell and Paul A. Ruud University of California, Berkeley March 2008 Preliminary and Incomplete Comments Welcome Abstract This paper

More information

Robust Standard Errors in Small Samples: Some Practical Advice

Robust Standard Errors in Small Samples: Some Practical Advice Robust Standard Errors in Small Samples: Some Practical Advice Guido W. Imbens Michal Kolesár First Draft: October 2012 This Draft: December 2014 Abstract In this paper we discuss the properties of confidence

More information

The Case Against JIVE

The Case Against JIVE The Case Against JIVE Related literature, Two comments and One reply PhD. student Freddy Rojas Cama Econometrics Theory II Rutgers University November 14th, 2011 Literature 1 Literature 2 Key de nitions

More information

Wild Cluster Bootstrap Confidence Intervals

Wild Cluster Bootstrap Confidence Intervals Wild Cluster Bootstrap Confidence Intervals James G MacKinnon Department of Economics Queen s University Kingston, Ontario, Canada K7L 3N6 jgm@econqueensuca http://wwweconqueensuca/faculty/macinnon/ Abstract

More information

FNCE 926 Empirical Methods in CF

FNCE 926 Empirical Methods in CF FNCE 926 Empirical Methods in CF Lecture 11 Standard Errors & Misc. Professor Todd Gormley Announcements Exercise #4 is due Final exam will be in-class on April 26 q After today, only two more classes

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models Takashi Yamagata y Department of Economics and Related Studies, University of York, Heslington, York, UK January

More information

The returns to schooling, ability bias, and regression

The returns to schooling, ability bias, and regression The returns to schooling, ability bias, and regression Jörn-Steffen Pischke LSE October 4, 2016 Pischke (LSE) Griliches 1977 October 4, 2016 1 / 44 Counterfactual outcomes Scholing for individual i is

More information

Empirical Methods in Applied Economics

Empirical Methods in Applied Economics Empirical Methods in Applied Economics Jörn-Ste en Pischke LSE October 2007 1 Instrumental Variables 1.1 Basics A good baseline for thinking about the estimation of causal e ects is often the randomized

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

A Course on Advanced Econometrics

A Course on Advanced Econometrics A Course on Advanced Econometrics Yongmiao Hong The Ernest S. Liu Professor of Economics & International Studies Cornell University Course Introduction: Modern economies are full of uncertainties and risk.

More information

Bootstrapping the Grainger Causality Test With Integrated Data

Bootstrapping the Grainger Causality Test With Integrated Data Bootstrapping the Grainger Causality Test With Integrated Data Richard Ti n University of Reading July 26, 2006 Abstract A Monte-carlo experiment is conducted to investigate the small sample performance

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

A Direct Test for Consistency of Random Effects Models that Outperforms the Hausman Test

A Direct Test for Consistency of Random Effects Models that Outperforms the Hausman Test A Direct Test for Consistency of Random Effects Models that Outperforms the Hausman Test Preliminary Version: This paper is under active development. Results and conclusions may change as research progresses.

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Non-Spherical Errors

Non-Spherical Errors Non-Spherical Errors Krishna Pendakur February 15, 2016 1 Efficient OLS 1. Consider the model Y = Xβ + ε E [X ε = 0 K E [εε = Ω = σ 2 I N. 2. Consider the estimated OLS parameter vector ˆβ OLS = (X X)

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Section 7 Model Assessment This section is based on Stock and Watson s Chapter 9. Internal vs. external validity Internal validity refers to whether the analysis is valid for the population and sample

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 19: Nonparametric Analysis

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional Heteroskedasticity We now consider the implications of relaxing the assumption that the conditional variance V (u i x i ) = σ 2 is common to all observations i = 1,..., In many applications, we may suspect

More information

Violation of OLS assumption- Multicollinearity

Violation of OLS assumption- Multicollinearity Violation of OLS assumption- Multicollinearity What, why and so what? Lars Forsberg Uppsala University, Department of Statistics October 17, 2014 Lars Forsberg (Uppsala University) 1110 - Multi - co -

More information

GLS. Miguel Sarzosa. Econ626: Empirical Microeconomics, Department of Economics University of Maryland

GLS. Miguel Sarzosa. Econ626: Empirical Microeconomics, Department of Economics University of Maryland GLS Miguel Sarzosa Department of Economics University of Maryland Econ626: Empirical Microeconomics, 2012 1 When any of the i s fail 2 Feasibility 3 Now we go to Stata! GLS Fixes i s Failure Remember that

More information

When is it really justifiable to ignore explanatory variable endogeneity in a regression model?

When is it really justifiable to ignore explanatory variable endogeneity in a regression model? Discussion Paper: 2015/05 When is it really justifiable to ignore explanatory variable endogeneity in a regression model? Jan F. Kiviet www.ase.uva.nl/uva-econometrics Amsterdam School of Economics Roetersstraat

More information

Difference-in-Differences Estimation

Difference-in-Differences Estimation Difference-in-Differences Estimation Jeff Wooldridge Michigan State University Programme Evaluation for Policy Analysis Institute for Fiscal Studies June 2012 1. The Basic Methodology 2. How Should We

More information

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p GMM and SMM Some useful references: 1. Hansen, L. 1982. Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p. 1029-54. 2. Lee, B.S. and B. Ingram. 1991 Simulation estimation

More information

Generalized Method of Moments: I. Chapter 9, R. Davidson and J.G. MacKinnon, Econometric Theory and Methods, 2004, Oxford.

Generalized Method of Moments: I. Chapter 9, R. Davidson and J.G. MacKinnon, Econometric Theory and Methods, 2004, Oxford. Generalized Method of Moments: I References Chapter 9, R. Davidson and J.G. MacKinnon, Econometric heory and Methods, 2004, Oxford. Chapter 5, B. E. Hansen, Econometrics, 2006. http://www.ssc.wisc.edu/~bhansen/notes/notes.htm

More information

Chapter 1. GMM: Basic Concepts

Chapter 1. GMM: Basic Concepts Chapter 1. GMM: Basic Concepts Contents 1 Motivating Examples 1 1.1 Instrumental variable estimator....................... 1 1.2 Estimating parameters in monetary policy rules.............. 2 1.3 Estimating

More information

Ordinary Least Squares Regression

Ordinary Least Squares Regression Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Notes on Panel Data and Fixed Effects models

Notes on Panel Data and Fixed Effects models Notes on Panel Data and Fixed Effects models Michele Pellizzari IGIER-Bocconi, IZA and frdb These notes are based on a combination of the treatment of panel data in three books: (i) Arellano M 2003 Panel

More information

Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model

Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 7: the K-Varable Linear Model IV

More information

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Songnian Chen a, Xun Lu a, Xianbo Zhou b and Yahong Zhou c a Department of Economics, Hong Kong University

More information

ECONOMETRICS II (ECO 2401) Victor Aguirregabiria. Spring 2018 TOPIC 4: INTRODUCTION TO THE EVALUATION OF TREATMENT EFFECTS

ECONOMETRICS II (ECO 2401) Victor Aguirregabiria. Spring 2018 TOPIC 4: INTRODUCTION TO THE EVALUATION OF TREATMENT EFFECTS ECONOMETRICS II (ECO 2401) Victor Aguirregabiria Spring 2018 TOPIC 4: INTRODUCTION TO THE EVALUATION OF TREATMENT EFFECTS 1. Introduction and Notation 2. Randomized treatment 3. Conditional independence

More information

Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Chapter 2. Dynamic panel data models School of Economics and Management - University of Geneva Christophe Hurlin, Université of Orléans University of Orléans April 2018 C. Hurlin (University of Orléans)

More information

1 Regression with Time Series Variables

1 Regression with Time Series Variables 1 Regression with Time Series Variables With time series regression, Y might not only depend on X, but also lags of Y and lags of X Autoregressive Distributed lag (or ADL(p; q)) model has these features:

More information

TECHNICAL WORKING PAPER SERIES ROBUST INFERENCE WITH MULTI-WAY CLUSTERING. A. Colin Cameron Jonah B. Gelbach Douglas L. Miller

TECHNICAL WORKING PAPER SERIES ROBUST INFERENCE WITH MULTI-WAY CLUSTERING. A. Colin Cameron Jonah B. Gelbach Douglas L. Miller TECHNICAL WORKING PAPER SERIES ROBUST INFERENCE WITH MULTI-WAY CLUSTERING A. Colin Cameron Jonah B. Gelbach Douglas L. Miller Technical Working Paper 327 http://www.nber.org/papers/t0327 NATIONAL BUREAU

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University Instructions: Answer all four (4) questions. Be sure to show your work or provide su cient justi cation for

More information

Econometrics Midterm Examination Answers

Econometrics Midterm Examination Answers Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 1 Jakub Mućk Econometrics of Panel Data Meeting # 1 1 / 31 Outline 1 Course outline 2 Panel data Advantages of Panel Data Limitations of Panel Data 3 Pooled

More information

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator Chapter 6: Endogeneity and Instrumental Variables (IV) estimator Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 15, 2013 Christophe Hurlin (University of Orléans)

More information

Day 3B Nonparametrics and Bootstrap

Day 3B Nonparametrics and Bootstrap Day 3B Nonparametrics and Bootstrap c A. Colin Cameron Univ. of Calif.- Davis Frontiers in Econometrics Bavarian Graduate Program in Economics. Based on A. Colin Cameron and Pravin K. Trivedi (2009,2010),

More information

Gravity Models, PPML Estimation and the Bias of the Robust Standard Errors

Gravity Models, PPML Estimation and the Bias of the Robust Standard Errors Gravity Models, PPML Estimation and the Bias of the Robust Standard Errors Michael Pfaffermayr August 23, 2018 Abstract In gravity models with exporter and importer dummies the robust standard errors of

More information

Applied Econometrics. Lecture 3: Introduction to Linear Panel Data Models

Applied Econometrics. Lecture 3: Introduction to Linear Panel Data Models Applied Econometrics Lecture 3: Introduction to Linear Panel Data Models Måns Söderbom 4 September 2009 Department of Economics, Universy of Gothenburg. Email: mans.soderbom@economics.gu.se. Web: www.economics.gu.se/soderbom,

More information

Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels.

Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels. Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels. Pedro Albarran y Raquel Carrasco z Jesus M. Carro x June 2014 Preliminary and Incomplete Abstract This paper presents and evaluates

More information

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations. Exercises for the course of Econometrics Introduction 1. () A researcher is using data for a sample of 30 observations to investigate the relationship between some dependent variable y i and independent

More information

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic Chapter 6 ESTIMATION OF THE LONG-RUN COVARIANCE MATRIX An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic standard errors for the OLS and linear IV estimators presented

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

Bootstrap Testing in Econometrics

Bootstrap Testing in Econometrics Presented May 29, 1999 at the CEA Annual Meeting Bootstrap Testing in Econometrics James G MacKinnon Queen s University at Kingston Introduction: Economists routinely compute test statistics of which the

More information

Strati cation in Multivariate Modeling

Strati cation in Multivariate Modeling Strati cation in Multivariate Modeling Tihomir Asparouhov Muthen & Muthen Mplus Web Notes: No. 9 Version 2, December 16, 2004 1 The author is thankful to Bengt Muthen for his guidance, to Linda Muthen

More information

Econometrics - 30C00200

Econometrics - 30C00200 Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business

More information