Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance. Let: cov(x, ˆ y) = n (x i x)(y i ȳ) n Prove that plim cov(x, ˆ y) = cov(x, y). i= We will use this result repeatedly in this problem set and in the future. So once you have proved this result, please feel free to take it as given for the remainder of this course. You can also take as given that if you define var(x) ˆ = cov(x, ˆ x) then plim var(x) ˆ = var(x). 2 OLS with a single explanatory variable In many cases, the best way to understand various issues in regression analysis - measurement errors, proxy variables, omitted variables bias, etc. - is to work through the issue in the special case of a single explanatory variable. That way, we can develop intuition without getting lost in the linear algebra. Once we have the basics down, we can then look at the multivariate case to see if anything changes. This problem goes through the main starting results. Suppose our regression model has an intercept and a single explanatory variable, i.e.: y = β 0 + β x + u where (y, x, u) are scalar random variables. To keep things fairly general, we will assume this is a model of the best linear predictor, i.e. E(u) = E(xu) = cov(x, u) = 0. Our data consists of a random sample of size n on (x, y), arranged into the matrices: x x 2 X =.. x n y = y y 2. y n Let: [ ] ˆβ0 ˆβ = = (X ˆβ X) X y () be the usual OLS regression coefficients.
ECON 835, Fall 202 2 a) Show that: b) Show that equation () implies that: ˆβ = β = cov(x, y) var(x) β 0 = E(y) β E(x) n n i= (x i x)(y i ȳ) cov(x, n i= (x = ˆ y) i x) 2 var(x) ˆ n ˆβ 0 = ȳ ˆβ x The idea for this problem is that you get a little practice translating between different ways of writing the same model, so even if you know another way to get these results please start with equation (). c) Without using linear algebra (i.e., just apply Slutsky s theorem and the Law of Large Numbers to the result from part (b) of this question), prove that plim ˆβ = β plim ˆβ 0 = β 0 3 Measurement error Often variables are measured with error. Let (y, x, u) be scalar random variables such that y = β 0 + β x + u where cov(x, u) = 0 Unfortunately, we do not have data on y and x; instead we have data on ỹ and x where: ỹ = y + v x = x + w where w and v are scalar random variables representing measurement error. We assume classical measurement error : Let ɛ x = var(w)/var(x) and let ɛ y = var(v)/var(y). cov(v, x) = cov(v, u) = cov(v, w) = 0 cov(w, x) = cov(w, u) = cov(w, v) = 0 a) Let ˆβ be the OLS regression coefficient from the regression of ỹ on x. Find plim ˆβ in terms of (β, ɛ x, ɛ y ) b) What is the effect of (classical) measurement error in x on the sign and magnitude of plim ˆβ? c) What is the effect of (classical) measurement error in y on the sign and magnitude of plim ˆβ? Strictly speaking the classical model of measurement error also assumes independence and normality, but we won t need those for our results
ECON 835, Fall 202 3 4 Omitted variables Suppose you want to estimate the coefficient β in the regression: y = β 0 + β x + β 2 x 2 + u where cov(u, x ) = cov(u, x 2 ) = 0. Unfortunately, your data consist only of a random sample on (y, x ). So you estimate β by the OLS regression of y on x : ˆβ = cov(x ˆ, y) var(x ˆ ) a) Find plim ˆβ in terms of the model parameters (β 0, β, β 2 ), var(x ), and cov(x, x 2 ). b) Your results above imply that in order for ˆβ to be a consistent estimator of β, we need the variable omitted to be either unrelated to the outcome (β 2 = 0) or unrelated to the explanatory variable of interest (cov(x, x 2 ) = 0). It is common in applied work to make educated guesses about the signs of β 2 and cov(x, x 2 ), in order to at least know the sign of the bias 2 in ˆβ. Suppose that y is earnings at age 40, x is years of schooling, and x 2 is ability as measured on an IQ test. Make a guess about the signs of β 2 and cov(x, x 2 ) (any guesses are acceptable). Then use these guesses to make a prediction about whether our regression coefficient ˆβ will be biased upwards or downwards (here, your answer should be consistent with your guesses). 5 Choice of units: The simple version In applied work one is often faced with choosing units for our variables. Should we express proportions as decimals or percentages? Miles or kilometers? etc. The short answer is that it doesn t matter if we are comparing across linearly related scales; the OLS coefficients will scale accordingly, so one can choose units according to convenience. Suppose we have a sample (random or otherwise; this question is about an algebraic property of OLS and not a statistical property) on the scalar random variables (y, x). Let the regression coefficient for the OLS regression of y on x be: ˆβ = cov(x, ˆ y) var(x) ˆ Now let s suppose we take a linear transformation of our data. That is, let: x i = ax i + b ỹ i = cy i + d where (a, b, c, d) are a set of scalars (both a and c must be nonzero), and let the regression coefficient for the OLS regression of ỹ on x be: cov( x, β = ˆ ỹ) var( x) ˆ a) Find β in terms of ( ˆβ, a, b, c, d). 2 Technically I should use the word inconsistency rather than bias since we re talking about plim ˆβ and not E( ˆβ ). But applied researchers often use the term omitted variables bias to refer to both inconsistency and bias, and our results will also apply to E( ˆβ ) if we make the linear CEF assumption.
ECON 835, Fall 202 4 6 Choice of units: The complicated version Let y be an n matrix of outcomes, and X be an n K matrix of explanatory variables. Let: ˆβ = (X X) X y be the vector of coefficients from the OLS regression of y on X. We are interested in what will happen if we apply some linear transformation to our variables. a) We start by seeing what happens if we take some multiplicative transformation. Let: X = XA ỹ = cy where A is a K K matrix 3 with full rank (i.e., A exists) and c is a nonzero scalar. Let: β = ( X X) X ỹ be the vector of coefficients from the OLS regression of ỹ on X. Show that β = ca ˆβ. b) Suppose that the covariance matrix of ˆβ is Σ. What is the covariance matrix of β? c) Using this result, what happens to our OLS coefficients if we multiply one of the explanatory variables by 0 and leave everything else unchanged? d) Using this result, what happens to our OLS coefficients if we multiply the dependent variable by 0 and leave everything else unchanged? e) Next we consider an additive transformation. For this we suppose we have an intercept, and we change the notation slightly. Let: x x 2 X =. x n where x i is a (K ) matrix and let [ ] ˆβ0 ˆβ = = (X ˆβ X) X y where y is an n matrix of outcomes. Our transformed data are: X = X + ı n b ỹ = y + dı n where b is a K matrix whose first element is zero, d is a scalar, and ı n is an n-vector of ones. Let: [ ] β0 β = = ( β X X) X ỹ be the vector of coefficients from the OLS regression of ỹ i on x i. Show that β = ˆβ. The Frisch-Waugh-Lovell theorem might be useful here. f) Suppose that the covariance matrix of ˆβ is Σ. What is the covariance matrix of β? g) What happens to our OLS coefficients other than the intercept when we add 5 to the dependent variable for all observations? When we add 5 to one of the explanatory variables? 3 We are mostly interested in the case where A is diagonal, i.e., we are multiplying each column in X by some number. But notice that this setup includes a lot of other redefinitions of variables.
ECON 835, Fall 202 5 7 An application The following are the results of an OLS regression using U.S. state-level data. The dependent variable is the state divorce rate (in percent) and the explanatory variables are the state urbanization rate (in percent) and a set of indicator variables for the state s region (north central, south, and west; the northeast is the base category). Variable Coefficient % urban -.0003509 (.00203867) North Central.048356 (.085872) South.454 (.08223) West.340245 (.084284) Intercept.430096 (.55765) Number of observations 50 R 2 0.3203 Standard errors are reported in parentheses, and are calculated under the assumption of homoskedasticity. The estimated covariance matrix for the coefficients is: % urban North Central South West Intercept % urban 4.56e-06.00002096.00003254 -.0000768 -.0002890 North Central.00002096.00737400.0043739.0040646 -.005634 South.00003254.0043739.00674424.0040494 -.0064645 West -.0000768.0040646.0040494.0070384 -.0029238 Intercept -.0002890 -.005634 -.0064645 -.0029238.02426266 a) Suppose we are willing to assume that divorce is normally distributed conditional on the explanatory variables. Perform a finite sample (t) test at the 5% level of significance of the null hypothesis that the coefficient on % urban is equal to zero. That is, state the null and alternative hypotheses, the critical values for the test, the value of the test statistic and the result (reject or do not reject) of the test. b) Perform a finite sample (F) test at the 5% level of significance of the joint null hypothesis that the coefficients on the region indicators (North central, South, and West) are all zero. That is, state the null and alternative hypotheses (using the Rβ r = 0 format, and defining what R and r are), the critical values for the test, the value of the test statistic and the result (reject or do not reject) of the test. c) Suppose we are not willing to assume normality. Perform an asymptotic test at the 5% level of significance of the null hypothesis that the coefficient on % urban is equal to zero. That is, state the null and alternative hypotheses, the critical values for the test, the value of the test statistic and the result (reject or do not reject) of the test. d) Perform an asymptotic (Wald) test at the 5% level of significance of the joint null hypothesis that the coefficients on the region indicators (North central, South, and West) are all zero. That is, state the null and alternative hypotheses (using the g(β) = 0 format, and defining what g(.) is), the critical values for the test, the value of the test statistic and the result (reject or do not reject) of the test. e) Suppose that the divorce rate and urbanization rate were measured in decimal instead of percent. What would be:
ECON 835, Fall 202 6. The coefficient on the urbanization rate, its standard error, and its t-statistic? 2. The coefficient on North Central, its standard error, and its t-statistic?