Problem Set 2

Size: px

Start display at page:

Download "Problem Set 2"

Charles Jonas Summers
6 years ago
Views:

1 Problem Set 2 Paul Schrimpf Due: Friday, March 14, 2008 Problem 1 IV Basics Consider the model: y = Xβ + ɛ where X is T K. You are concerned that E[ɛ X] 0. Luckily, you have M > K variables, Z, which you think might serve as instruments. Solution to this question was written by Konrad last year. (a) What properties are required for Z to be a valid instrument? What is the exclusion restric- Question: tion? Answer: Z must satisfy two conditions: it must (1) have sufficient covariation with X in order to satisfy the rank condition ( rank plim 1 ) T Z X = rank(q ZX ) = k and (2) be uncorrelated with the disturbance ε, i.e. (and this is the exclusion restriction) (b) E[Z ε] = 0 Question: What is the IV estimator using Z as an instrument? What is its asymptotic distribution? Assume E[ɛɛ Z] = Σ. Answer In order to estimate β we construct instruments W = ZA T, where A T is a full-rank m k matrix that converges in probability to a limit A (I m allowing A T to be stochastic in order to nest the 2SLS case in the next part of the problem). Then we can form the usual IV estimator using W as instruments, ˆβ AT = (W X) 1 W y = (A TZ X) 1 A TZ y = β + (A TZ X) 1 A TZ ε From the exclusion restriction, plim ˆβ A = β for any A, and by the usual arguments, plim 1 T A TZ X = A Q ZX 1

2 e.g. from the Chebychev LLN. From an appropriate CLT (e.g. Lindeberg-Levy if we only have heteroskedasticity but individuals are i.i.d. draws from one stable distribution, or a CLT for mixing processes if we have (not too much) autocorrelation) and the Cramér-Wold device where by the law of iterated expectations T V 1 2 T Z ε = N (0, I k ) V T = 1 T E[Z εε Z] = 1 T E[Z E[εε Z]Z] = 1 T E[Z ΣZ] Since A T A, we can apply the continuous mapping theorem to the asymptotic variance of the IV estimator in order to obtain ( plimt Var( ˆβ A AT ) = plim T Z ) 1 X A ( T Z ΣZA T X ) 1 ZA T = (A Q ZX) 1 A Q ZΣZA(Q T T T ZXA) 1 where Q ZΣZ := plim 1 T E[Z ΣZ]. (c) Question: Suppose we form an instrument, Ŵ = ZÂ, where Â = (Z Z) 1 Z X, so Ŵ is T K. Show that ˆβ 2SLS = (Ŵ X) 1 Ŵ y is consistent. Calculate the asymptotic variance of ˆβ 2SLS. Answer: 2SLS is a special case of the IV estimator discussed above, and therefore it is consistent whenever the rank condition is satisfied, and by plugging in, the variance becomes (d) ( T Var( ˆβ X Z(Z Z) 1 Z ) 1 X Z ( ΣZ X Z(Z Z) 1 Z X 2SLS ) = T T T Question: Suppose that ˆΩ is a consistent estimator of the limit of Z ΣZ/T. Consider the following minimum chi-squared estimator: ˆβ χ = arg min(y Xβ) Z ˆΩ 1 Z (y Xβ) β Show that ˆβ χ = (Ã Z X) 1 (Ã Z y) (so ˆβ χ is an IV estimator) for some Ã. ) 1 Answer: The first-order condition for this minimization problem is X Z ˆΩ 1 Z (y Xβ) = 0 ˆβ IV = (X Z ˆΩ 1 ZX) 1 X Z ˆΩ 1 Z y which fits into the framework in (b) by choosing Ã = ˆΩ 1 ZX, which corresponds to 2SLS after a GLS transformation of the data. (e) Question: Show that the Ã is optimal in minimizing the asymptotic variance of IV. What does this imply about the asymptotic efficiency of ˆβ IV relative to 2SLS? (Hint: you can think of this as a linear model of the form Z y = Z Xβ + Z u and apply the Gauss-Markov theorem asymptotically.) 2

3 Answer: After the GLS transformation of the data by premultiplying all variables (including the constant) with (ˆΩ 1 ) 1 2, the linear model Z y = Z Xβ 0 + Z ε satisfies the Gauss-Markov assumptions, and therefore in the limit, 2SLS is BLUE (only asymptotically, there s still bias in finite samples!). Problem 2 Measurement Error This question comes from last year s midterm. Consider the simple linear model y i = β 1 + β 2 x i + ɛ i i = 1,..., n, where x i is unobserved but a measurement of x i of x i is observed with x i = x i + v i, where E(x i v i) = 0 and E(v i ɛ i ) = 0, E(x i ɛ i) = 0. (a) Question: What is the probability limit of the least squares estimator of β? How big would V ar(v i ) have to be relative to V ar(x i ) have to be for the plim of the least square estimator to be half the size of the true parameter. Answer: Let X i = x i x. The limit of ˆβ 2 is plim ˆβ 2 =β 2 + (plim 1 n X X) 1 (plim 1 n X (ɛ β 2 v)) ( ) =β 2 1 σ2 v σx 2 + σv 2 where σ 2 x = Var(x i ) and σ2 v = Var(v i ). Generally, we don t care very much about the constant in a regression, so it would be probably fine to answer this question without saying anything about ˆβ 1, but for completeness: ˆβ 1 =(1 nm x 1 n ) 1 (1 nm x y) where 1 n is an n 1 vector of ones, and M x is the orthogonal projection to x. Expanding gives: ˆβ 1 =β nm x (β 2 x + ɛ) n 1 nx(x x) 1 x 1 n 2 x +σ2 x ) p β 1 + β 2µ x µxβ 2(µ σx 2+µ2 x p β 1 1 µ2 x µ 2 x +σ2 x so ˆβ 1 is consistent. The reason is that even though x is measured with error, the constant is uncorrelated with x, so it s coefficient remains consistent. (b) Question: Suppose you have an instrumental variable z i for x i. What properties does z i need to have for consistent estimation in part (i)? Could your instrument be measured with error and still have a consistent estimator? 3

4 Answer: Like in problem 1, z must be correlated with x i, but uncorrelated with the error, ɛ i β 2 v i. This is possible even if z is measured with error, as long as the measurement error in z is uncorrelated with ɛ and the measurement error in x. (c) Question: You are worried that your instrumental variable may have a little bit of correlation with v. However, you are quite confident that Corr(x i, v i ) > Corr(z i, v i ). Can you make any conclusion that the IV estimator will be less inconsistent than the OLS estimator? Answer: From part (a), we know that ( ) plim ˆβ 2 O LS = β 2 1 σ2 v σx 2 + σv 2 = β(1 ρ xv ) where ρ xv is the correlation between x and v. Let s look at the plim of IV. Remove the mean from z and x. Since z and x are scalars, the choice of A in our IV estimator does not matter, so let s just take A = 1. Then, ˆβ IV =(Z X) 1 (Z y) =β 2 + (Z X) 1 (Z (ɛ β 2 v)) p β 2 (1 σ zv σ zx + σ zv ) p ρ zv β 2 (1 ) ρ zx + ρ zv So, the bias of IV depends on both the correlation of z and v, and the correlation of z and x. Regardless of ρ zv, if ρ zx is small, then ˆβ IV can be more biased than OLS. (d) Question: Suppose you have a nonclassical measurement error (E[x i v i] 0) but you know that Cov(x i, ɛ i ) = 0. What will be the plim of the least squares estimator of β? Show that you will have attenuation bias if Cov(x i, x i ) > 0 and V ar(x i) > V ar(x i ). Answer: The plim of ˆβ 2 is: plim ˆβ 2 =(plim 1 n X X) 1 (plim 1 n X X β 2 ) =β 2 cov(x, x ) Var(x) In general, the Cauchy-Schwarz inequality tells us that cov(x, y) 2 Var(x)Var(y). Then, since Var(x ) Var(x), we have cov(x,x ) Var(x) < 1, so there is attenuation bias. Problem 3 Misspecified First Stage Suppose you re interested in the effect of an educational policy on child outcomes. covariates, you have a model that looks like: After partialing out outcome i = β 1 policylevel i + ɛ i 4

5 where outcome is something like a standardized test score, and policylevel is something like the studentteacher ratio. For this exercise, think of the policy as a continuous choice variable. You re worried that policy level is endogenous. A school that has a lower student-teacher ratio is also likely to be higher quality in other, unobserved dimensions. You think that prior test scores are a valid instrument for the policy level. You think that prior test scores are related to the policy level through: policylevel i = α 1 priortest i + v i Sadly, you re wrong about the functional form of this equation. The truth is that policylevel i = α 0 priortest i p + v i that is, the policy level is actually a function of how far prior test scores are from some cutoff level. This question explores how misspecifying the first stage affects IV. Throughout, you may assume that E[ɛ priortest] = 0 and that ɛ i is i.i.d. The very last part of this question is especially difficult. You probably have not yet seen all the tools required to answer it, so don t worry if you can t. However, if you can answer the last question and prove your answer, you can skip all the other parts of this question. (a) Question: Suppose you had correctly specified the first stage and know p. You estimate β 1 using Z = priortest i p as an instrument. ˆβ IV = (A Z X) 1 (A Z y) where X = policylevel and y = outcome. Is this estimator consistent? What is its asymptotic variance? (You can just state your answer, no proof necessary.) Answer: Yes, this estimator is consistent because we know that E[ɛZ] = 0. As in problem 1, its asymptotic variance is: V ( ˆβ IV ) = (A Q ZX ) 1 A Q ZΣZ A(Q ZXA) 1 In this case, assuming that we have partialed out a constant so all variables are mean zero Q ZX = plim 1 N Z X =α 0 Var( priortest i p ) Q ZΣZ = plim 1 N Z ɛɛ Z =σ 2 Var( priortest i p ) so, V ( ˆβ IV ) = σ 2 α 2 0 Var( priortest i p ) (b) Question: estimate Now, suppose you choose the wrong first stage and use Z = priortest i as an instrument to β IV = (Ã Z X) 1 (Ã Z y) Is this estimator consistent? What is its asymptotic variance? 5

6 Answer: Yes, this estimator is consistent, provided that cov(x, Z) 0. As above, its asymptotic variance is given by the formula in question 1. As in part (a), if we substitute for Q ZX and Q ZΣZ we get V ( β IV ) = σ 2 α 2 1 Var(priortest i) (c) (i) Finite Sample Properties Question: How would you expect β IV to perform relative to ˆβ IV in small samples? Is either estimator unbiased? Would you prefer one over the other? Give some intuition for your answer. Answer: Neither IV estimator is unbiased. The reason is that E[ ˆβ IV β] = E [ (A Z X) 1 (A Z ɛ) ] Since X is correlated with ɛ, this expectation is generally non-zero. After spring break, we ll look at the finite sample bias of IV in greater detail. If we get the functional form wrong, the first stage may be substantially weaker (the denominator of the bias smaller), and as you will see this typically increases the finite-sample bias. (ii) Let s simulate the model to compare the two estimators. Let: x i = α z i z + u i + v i y i =βx i + u i + e i where α = β = 1. Let z, u, v, and e be i.i.d, N(0, 1) variables. Set z = 0.5. Simulate the model many times and compute ˆβ OLS, ˆβ2SLS = (X P Z X) 1 (X P Z y), and β 2SLS = (X P ZX) 1 (X P Zy). Report the MSE, median, and probability of reject H 0 : β = 1 for each of these estimators. Examine how the results depend on the sample size. Answer: From Konrad last year: here are my simulation results: OLS coefficient (median) IV coefficient, (median) OLS coefficient (iqr) IV coefficient, (iqr) OLS Rejection Rates IV Rejection Rates Here you can again see that the bias of OLS is unaffected by sample size, whereas the substantial median bias in the IV coefficient for N = 20 goes away as the sample size increases. Looking at the inter-quartile range, the IV estimator seems to have a standard error about 5 times the size of that of OLS, which is much more than in problem 5 even though most other magnitudes were comparable, and this is in fact a consequence of the bad approximation of the functional form of the first stage. In fact, if z were zero, this strategy breaks down completely: beta = OLS coefficient (median) IV coefficient, (median)

7 OLS coefficient (iqr) IV coefficient, (iqr) OLS Rejection Rates IV Rejection Rates You can see that IV has about the same bias as OLS, and furthermore, the interquartile range doesn t decrase at all even though from the first to the third column, we increased the sample size by a factor of 100! This is a consequence of having no first stage anymore, and again this should give you a taste for what the second half of the course is going to be about. (d) (i) Asymptotically Optimal Instruments Question: What choices of A and Ã minimize the asymptotic variances of ˆβ IV and β IV? Which estimator has a smaller asymptotic variance with this A and Ã? Answer: The easiest answer is that A does not matter, since in both cases, Z and X are scalars. If this were not the case, A would matter, but we already know from problem 1 that the optimal A is A = Ω 1 Z X. Here, since ɛ is iid, Ω = Z Zσ 2, so the optimal A is just (Z Z) 1 Z X, which gives us 2SLS. Below, we ll show that Z E[x z] yields the minimum asymptotic variance among instruments of the form Z = g(z). Applying this result, it must be that ˆβ IV has smaller variance. (ii) Question: Consider a more general setup where instead of the first stage given above you have E[X Z] = f(z) for some function. Among IV estimators of the form: what choice of g() and A is asymptotically efficient? ˆβ IV (g(z), A) =(A g(z) X) 1 (A g(z) y) Answer: For a given g(), we know that A = (g(z) g(z)) 1 g(z) X is optimal for the same reason as in part (d)(i). Alternatively, we know that A doesn t matter when x is a scalar and g(z) is scalar valued. Sticking to the scalar case and assuming all variables have been de-meaned, the asymptotic variance for ˆβ using g(z) as an instrument is: V (β g ) = σ 2 Var(g(z)) cov(x, g(z)) 2 We can rewrite cov(x, g(z)) as: cov(x, g(z)) =E[xg(z)] =E[E[x z]g(z)] =E[f(z), g(z)] = cov(f(z), g(z)) Now comes the tricky part. Observe that the set of functions that map z to R, equipped with innner product f, g = cov(f(z), g(z)) forms an inner product space. The Cauchy-Schwarz inequality tells us that: f, g 2 f, f g, g Therefor, cov(f(z), g(z)) 2 Var(f(z))Var(g(z)) and in particular, V (β g ) = σ 2 Var(g(z)) cov(f(z), g(z)) 2 V (β f ) = σ 2 1 Var(f(z)) 7

8 (iii) Question: Suppose you don t know f(z) = E[X Z] exactly, but know it up to a finite dimensional parameter, θ. Can you construct a feasible estimator that has the same asymptotic variance as the efficient estimator when f(z) is known? Suppose you don t know anything about f(z), except that it s a wellbehaved function. Is there a feasible estimator with the same asymptotic variance as the efficient estimator when f(z) is known? Answer: The answer is yes, you can construct a feasible efficient estimator in both cases. Newey (1990) showed this result in a slightly more general setting when f(z) is completely unknown. To begin with, let s suppose we have some estimate of f(z), ˆf(z), and look at the asymptotic distribution of ˆβ ˆf. For simplicity, we will stick with the case where x, z, and f(z) are all scalars. We have: ˆβ ˆf =( ˆf X) 1 ˆf y =β + ( ˆf X) 1 ˆf ɛ where ˆf is short for ˆf(z). Notice that we can always write x = E[x z] + v = f(z) + v where E[v z] = 0. Thus, we have ˆβ ˆf β = ˆf ɛ ˆf (f + v) f ˆβ ɛ + ( ˆf β = ˆf f) ɛ f (f + v) + ( ˆf f) (f + v) n( ˆβ ˆf β) = 1 n f ɛ + 1 n ( ˆf f) ɛ 1 n f (f + v) + 1 n ( ˆf f) (f + v) The first terms in the numerator and denominator are exactly what we would get if f were known. Therefore, ˆβ ˆf has the same distribution as ˆβ f if and only if and plim 1 n ( ˆf f) ɛ = 0 (1) plim 1 n ( ˆf f) (f + v) = 0 (2) Equation (2) will be satisfied, if ˆf converges to f in probability uniformly in z. That is, if for any ɛ > 0, there s an N such that P ( ˆf(z) f(z) > ɛ) < ɛ z Z where Z is the domain of z. When f is known up to a finite dimensional parameter, then we know that we can estimate θ consistently by doing nonlinear-least squares. In this case, ˆf(z) f(z) =f(z, ˆθ) f(z, θ) = f(z, θ) (ˆθ θ) θ so ˆf p f uniformly if f θ is bounded. More generally, in this case plim 1 n ( ˆf f) (f + v) = plim(ˆθ θ) 1 n f(z i, θ) (f(z i ) + v i ) n θ i=1 [ ] f(zi, θ) =0 E (f(z i ) + v i ) θ 8

9 [ ] so we just need E f(zi,θ) 1 θ (f(z i ) + v i ) to be finite. A similar argument shows that if n f θ ɛ converges in distribution, then equation (1) is satisfied. Therefore, in the parametric case (when f is known up a finite number of parameters) a sufficient condition (in addition to the usual conditions needed for IV) for ˆβ ˆf to be [ ( ) ] 2 asymptotically efficient is that E f θ is finite. When f is completely unknown, we must estimate it nonparametrically. The details of showing that (1) and (2) are satisfied depend on the particular nonparametric estimator we use. Problem 4 Inference with Difference in Differences In this question, you ll reproduce some of the results from Bertrand, Duflo, and Mullainathan (2004). You ll then explore how the bias and size corrected test of Hausman and Kuersteiner (2007) performs in this setup. On the course website, I posted data from the CPS merged outgoing rotation groups. I originally obtained the data from I merged the CPS-MORG data for all the years and kept a few of the variables. If you re curious, I posted the do-file I used. This is the same data as used by Bertrand, Duflo, and Mullainathan (2004). The variables in the data are: ˆ state state census code ˆ year ˆ age ˆ female 1 = female, 0 = male ˆ yrsedu years of education ˆ earnwke weekly earnings ˆ employed whether employed ˆ white 1=white, 0=non-white ˆ black 1=black, 0=non-black The data set is rather large. If it is annoyingly slow to work with, you might want to limit the sample in some way. Bertrand, Duflo, and Mullainathan just use prime age white women. They also aggregate the data to state-year means for most of their simulations (the ones where the first column of the table is labeled CPS agg). You will probably also want to aggregate the data to save time. You will need to aggregate the data to impliment Hausman and Kuersteiner s estimator. (a) Reproducing Bertrand, Duflo, and Mullainathan Question: Perform similar simulations as Bertrand, Duflo, and Mullainathan (2004), and reproduce their key results. More specifically, randomly simulate policy changes and use difference in differences to estimate the effect of these changes. For each simulation, first sample 50 states with replacement from the data. Then, simulate the policy changes by randomly picking a year between 1985 and Randomly choose half of the states to have been affected. Then, estimate y ist = α s + λ t + γt st + x ist β + ɛ ist Estimate γ using OLS with various types of standard errors. Specifically, consider the usual, homoskedastic, iid OLS standard errors, standard errors clustered at state and year, and standard errors clustered at state. (If you want, you can also try some of the other things Bertrand, Duflo, and Mullainathan did). Report the fraction of simulations where you reject H 0 : γ = 0. Do this under both γ = 0 and γ = Explore how the results vary with N (the number of states) and T (the number of years). 9

10 Answer: See the code I posted. Table 1 shows the results of the simulations. The results are similar to Bertrand, Duflo, and Mullainathan. OLS and OLS state-year clustered over-reject for large N and T. OLS state clustered works well for N=50 and 20, but less well for smaller N. (b) FGLS Question: Perform the same simulations, but estimate γ using FGLS. Compare the results to OLS. As in Hausman and Kuersteiner (or section IV.D of Bertrand, Duflo, and Mullainathan), assume that the autocorrelation process is the same in all states and there is no heteroskedasticity. Again, explore how the results depend on N and T. Answer: The presence of fixed effects makes doing FGLS somewhat tricky. In particular, the natural estimate of the variance matrix, S ˆΣ = ê s ê s will be rank T 1 instead of rank T. You can proceed as in Kiefer (1980). Specifically: s=1 1. Aggregate the data to state-year averages. As in Bertrand, Duflo, and Mullainathan, first regress the outcome on any demographics you want to control for. Then, take the mean of the residuals within each state and year. Call this y s,t. Using OLS, etimate: y s,t = α s + λ t + γt s,t + e s,t 2. Form ˆΣ = S s=1 êsê s where ê s is the T 1 vector of residuals from state s. ˆΣ is singular, so we need to take a generalized inverse instead of its inverse. Following Kiefer (1980), let S be the T 1 T 1 matrix consisting of the first T 1 rows and columns of ˆΣ. Let S = S Estimate ˆβ F GLS = (X (I s M 1t S 1 M 1t )X) 1 (X (I s M 1t S 1 M 1t )y), where M 1t = I t 1 T 1 T 1 T is the matrix that projects out fixed effects, and X consists of time fixed effects and the policy indicator, and y = [y 11, y 12,..., y 1T,..., y ST ] and X is similarly arranged. See the code for more details. The results are in Table 1. FGLS does not perform very well. It only has nearly correct size for N=50 and T =10. However, in these situations FGLS is dominated in terms of power by OLS state clustered. (c) Bias-corrected FGLS Since T is small, and the regression includes state fixed effects, estimates of the variance matrix will be biased. Hausman and Kuersteiner show how to correct this bias (page 7). Implement this correction and repeat the simulations in part (b). How much does this bias correction help? How large does T need to be for the bias to not matter? Hint: there were two things that tripped me up when I was trying to do this. First, Hausman and Kuersteiner sort the variables as time 1 state 1, time 1 state 2,... time 1 state S, time 2 state 1, etc. This is the opposite of what I had been doing. You ll need to either reorder your variables, or reverse the order of the kronecker products in Hausman and Kuersteiner s formulas. The other thing that was difficult was figuring out what was meant by V. V s is supposed to be a vector of all right hand side variables at all times, except for state fixed effects. Since, after aggregating, the only variables we have are a policy indicator and time fixed effects, if you naively include all of these, V will not be full. 10

11 rank. In our setup, V s should just be two variables, a constant, and an indicator for whether the policy was ever impleented. Once you figure out the two things mentioned in the hint, implementing the bias correction is just a matter of following Hausman and Kuersteiner s formulas. See the code for details. In the table, we that bias-corrected FGLS almost always does better than plain FGLS. (d) Size-corrected Tests Question: Hausman and Kuersteiner (2007) also propose a size correction for testing H 0 : γ = γ 0 for when N is small. 1 Repeat the simulations in the previous sub-question using size corrected tests. Discuss the results. Answer: Following the theorem 3.1 of Hausman and Kuersteiner, the size-corrected tests, replaces the usual t = 1.96 critical value with ( t sc = t 1 + (1 + ) t2 ) + 2(T 2) 2S Since t sc > t, we know that size and bias corrected FGLS will always have a smaller size and power than bias corrected FGLS. However, this is a good thing, since bias corrected FGLS tends to over-rejct. From the table, we see that the size correction works fairly well, except for N=6, T=5. I am fairly impressed by how well the size correction works. I expected the results to be much worse. One method of inference in difference in differences setups, which we have not talked about, is the synthetic control method of Abadie, Diamond, and Hainmueller (2007). Abadie presented this paper at the econometrics seminar recently, and it appears quite promising. If you ever estimate a difference in differences in your research, the Abadie paper is definitely worth reading. Table 1: Simulation Results Estimator N T Rejection Rate No Effect Rejection Rate 2% Effect OLS OLS state-year clustered OLS state clustered OLS aggregate FGLS FGLS bias-corrected FGLS bias and size corrected OLS OLS state-year clustered OLS state clustered OLS aggregate FGLS FGLS bias-corrected FGLS bias and size corrected OLS OLS state-year clustered OLS state clustered OLS aggregate FGLS FGLS bias-corrected Notice that this setup falls under the special case in section 3 of the paper, so you can use the simpler formulas for the test given in theorem

12 FGLS bias and size corrected OLS OLS state-year clustered OLS state clustered OLS aggregate FGLS FGLS bias-corrected FGLS bias and size corrected OLS OLS state-year clustered OLS state clustered OLS aggregate FGLS FGLS bias-corrected FGLS bias and size corrected OLS OLS state-year clustered OLS state clustered OLS aggregate FGLS FGLS bias-corrected FGLS bias and size corrected OLS OLS state-year clustered OLS state clustered OLS aggregate FGLS FGLS bias-corrected FGLS bias and size corrected OLS OLS state-year clustered OLS state clustered OLS aggregate FGLS FGLS bias-corrected FGLS bias and size corrected OLS OLS state-year clustered OLS state clustered OLS aggregate FGLS FGLS bias-corrected FGLS bias and size corrected OLS OLS state-year clustered OLS state clustered OLS aggregate FGLS FGLS bias-corrected FGLS bias and size corrected

13 OLS OLS state-year clustered OLS state clustered OLS aggregate FGLS FGLS bias-corrected FGLS bias and size corrected Based on 100 simulations 13

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data