Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Similar documents
Final Exam. Economics 835: Econometrics. Fall 2010

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Econ 2120: Section 2

Review of Econometrics

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Econometrics I Lecture 3: The Simple Linear Regression Model

We begin by thinking about population relationships.

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Covariance and Correlation

ECON The Simple Regression Model

Econometrics Summary Algebraic and Statistical Preliminaries

The Simple Linear Regression Model

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Motivation for multiple regression

ECON 5350 Class Notes Functional Form and Structural Change

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

The Statistical Property of Ordinary Least Squares

Consequences of measurement error. Psychology 588: Covariance structure and factor models

ECON 3150/4150, Spring term Lecture 7

Homoskedasticity. Var (u X) = σ 2. (23)

Applied Statistics and Econometrics

MA Advanced Econometrics: Applying Least Squares to Time Series

The Simple Regression Model. Part II. The Simple Regression Model

WISE International Masters

An Introduction to Parameter Estimation

An overview of applied econometrics

Lecture Notes on Measurement Error

Linear Regression. Junhui Qian. October 27, 2014

Econometrics Review questions for exam

The returns to schooling, ability bias, and regression

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs

EC402 - Problem Set 3

Advanced Econometrics I

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

F9 F10: Autocorrelation

Ch 7: Dummy (binary, indicator) variables

MEI Exam Review. June 7, 2002

Asymptotic Theory. L. Magee revised January 21, 2013

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

The multiple regression model; Indicator variables as regressors

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

1 Motivation for Instrumental Variable (IV) Regression

Least Squares Estimation-Finite-Sample Properties

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

ECON Introductory Econometrics. Lecture 16: Instrumental variables

EMERGING MARKETS - Lecture 2: Methodology refresher

Dealing With Endogeneity

Recitation 1: Regression Review. Christina Patterson

Applied Econometrics (QEM)

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Chapter 2: simple regression model

Multiple Linear Regression

Econometrics of Panel Data

Answers to Problem Set #4

ECON3150/4150 Spring 2015

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Econ 510 B. Brown Spring 2014 Final Exam Answers

Economics 583: Econometric Theory I A Primer on Asymptotics

Problem Set - Instrumental Variables

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Instrumental Variables

Gov 2000: 9. Regression with Two Independent Variables

Regression #4: Properties of OLS Estimator (Part 2)

Linear Model Under General Variance

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

Multiple Regression Analysis

Model Specification and Data Problems. Part VIII

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Economics 241B Estimation with Instruments

[y i α βx i ] 2 (2) Q = i=1

Multivariate Regression Analysis

Testing Linear Restrictions: cont.

Interpreting Regression Results

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

LECTURE 2: SIMPLE REGRESSION I

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Intermediate Econometrics

Applied Econometrics (QEM)

Lecture 6: Geometry of OLS Estimation of Linear Regession

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

the error term could vary over the observations, in ways that are related

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Econometrics Problem Set 6

Correlation and Linear Regression

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Applied Quantitative Methods II

7 Introduction to Time Series Time Series vs. Cross-Sectional Data Detrending Time Series... 15

11. Further Issues in Using OLS with TS Data

Transcription:

Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance. Let: cov(x, ˆ y) = n (x i x)(y i ȳ) n Prove that plim cov(x, ˆ y) = cov(x, y). i= We will use this result repeatedly in this problem set and in the future. So once you have proved this result, please feel free to take it as given for the remainder of this course. You can also take as given that if you define var(x) ˆ = cov(x, ˆ x) then plim var(x) ˆ = var(x). 2 OLS with a single explanatory variable In many cases, the best way to understand various issues in regression analysis - measurement errors, proxy variables, omitted variables bias, etc. - is to work through the issue in the special case of a single explanatory variable. That way, we can develop intuition without getting lost in the linear algebra. Once we have the basics down, we can then look at the multivariate case to see if anything changes. This problem goes through the main starting results. Suppose our regression model has an intercept and a single explanatory variable, i.e.: y = β 0 + β x + u where (y, x, u) are scalar random variables. To keep things fairly general, we will assume this is a model of the best linear predictor, i.e. E(u) = E(xu) = cov(x, u) = 0. Our data consists of a random sample of size n on (x, y), arranged into the matrices: x x 2 X =.. x n y = y y 2. y n Let: [ ] ˆβ0 ˆβ = = (X ˆβ X) X y () be the usual OLS regression coefficients.

ECON 835, Fall 202 2 a) Show that: b) Show that equation () implies that: ˆβ = β = cov(x, y) var(x) β 0 = E(y) β E(x) n n i= (x i x)(y i ȳ) cov(x, n i= (x = ˆ y) i x) 2 var(x) ˆ n ˆβ 0 = ȳ ˆβ x The idea for this problem is that you get a little practice translating between different ways of writing the same model, so even if you know another way to get these results please start with equation (). c) Without using linear algebra (i.e., just apply Slutsky s theorem and the Law of Large Numbers to the result from part (b) of this question), prove that plim ˆβ = β plim ˆβ 0 = β 0 3 Measurement error Often variables are measured with error. Let (y, x, u) be scalar random variables such that y = β 0 + β x + u where cov(x, u) = 0 Unfortunately, we do not have data on y and x; instead we have data on ỹ and x where: ỹ = y + v x = x + w where w and v are scalar random variables representing measurement error. We assume classical measurement error : Let ɛ x = var(w)/var(x) and let ɛ y = var(v)/var(y). cov(v, x) = cov(v, u) = cov(v, w) = 0 cov(w, x) = cov(w, u) = cov(w, v) = 0 a) Let ˆβ be the OLS regression coefficient from the regression of ỹ on x. Find plim ˆβ in terms of (β, ɛ x, ɛ y ) b) What is the effect of (classical) measurement error in x on the sign and magnitude of plim ˆβ? c) What is the effect of (classical) measurement error in y on the sign and magnitude of plim ˆβ? Strictly speaking the classical model of measurement error also assumes independence and normality, but we won t need those for our results

ECON 835, Fall 202 3 4 Omitted variables Suppose you want to estimate the coefficient β in the regression: y = β 0 + β x + β 2 x 2 + u where cov(u, x ) = cov(u, x 2 ) = 0. Unfortunately, your data consist only of a random sample on (y, x ). So you estimate β by the OLS regression of y on x : ˆβ = cov(x ˆ, y) var(x ˆ ) a) Find plim ˆβ in terms of the model parameters (β 0, β, β 2 ), var(x ), and cov(x, x 2 ). b) Your results above imply that in order for ˆβ to be a consistent estimator of β, we need the variable omitted to be either unrelated to the outcome (β 2 = 0) or unrelated to the explanatory variable of interest (cov(x, x 2 ) = 0). It is common in applied work to make educated guesses about the signs of β 2 and cov(x, x 2 ), in order to at least know the sign of the bias 2 in ˆβ. Suppose that y is earnings at age 40, x is years of schooling, and x 2 is ability as measured on an IQ test. Make a guess about the signs of β 2 and cov(x, x 2 ) (any guesses are acceptable). Then use these guesses to make a prediction about whether our regression coefficient ˆβ will be biased upwards or downwards (here, your answer should be consistent with your guesses). 5 Choice of units: The simple version In applied work one is often faced with choosing units for our variables. Should we express proportions as decimals or percentages? Miles or kilometers? etc. The short answer is that it doesn t matter if we are comparing across linearly related scales; the OLS coefficients will scale accordingly, so one can choose units according to convenience. Suppose we have a sample (random or otherwise; this question is about an algebraic property of OLS and not a statistical property) on the scalar random variables (y, x). Let the regression coefficient for the OLS regression of y on x be: ˆβ = cov(x, ˆ y) var(x) ˆ Now let s suppose we take a linear transformation of our data. That is, let: x i = ax i + b ỹ i = cy i + d where (a, b, c, d) are a set of scalars (both a and c must be nonzero), and let the regression coefficient for the OLS regression of ỹ on x be: cov( x, β = ˆ ỹ) var( x) ˆ a) Find β in terms of ( ˆβ, a, b, c, d). 2 Technically I should use the word inconsistency rather than bias since we re talking about plim ˆβ and not E( ˆβ ). But applied researchers often use the term omitted variables bias to refer to both inconsistency and bias, and our results will also apply to E( ˆβ ) if we make the linear CEF assumption.

ECON 835, Fall 202 4 6 Choice of units: The complicated version Let y be an n matrix of outcomes, and X be an n K matrix of explanatory variables. Let: ˆβ = (X X) X y be the vector of coefficients from the OLS regression of y on X. We are interested in what will happen if we apply some linear transformation to our variables. a) We start by seeing what happens if we take some multiplicative transformation. Let: X = XA ỹ = cy where A is a K K matrix 3 with full rank (i.e., A exists) and c is a nonzero scalar. Let: β = ( X X) X ỹ be the vector of coefficients from the OLS regression of ỹ on X. Show that β = ca ˆβ. b) Suppose that the covariance matrix of ˆβ is Σ. What is the covariance matrix of β? c) Using this result, what happens to our OLS coefficients if we multiply one of the explanatory variables by 0 and leave everything else unchanged? d) Using this result, what happens to our OLS coefficients if we multiply the dependent variable by 0 and leave everything else unchanged? e) Next we consider an additive transformation. For this we suppose we have an intercept, and we change the notation slightly. Let: x x 2 X =. x n where x i is a (K ) matrix and let [ ] ˆβ0 ˆβ = = (X ˆβ X) X y where y is an n matrix of outcomes. Our transformed data are: X = X + ı n b ỹ = y + dı n where b is a K matrix whose first element is zero, d is a scalar, and ı n is an n-vector of ones. Let: [ ] β0 β = = ( β X X) X ỹ be the vector of coefficients from the OLS regression of ỹ i on x i. Show that β = ˆβ. The Frisch-Waugh-Lovell theorem might be useful here. f) Suppose that the covariance matrix of ˆβ is Σ. What is the covariance matrix of β? g) What happens to our OLS coefficients other than the intercept when we add 5 to the dependent variable for all observations? When we add 5 to one of the explanatory variables? 3 We are mostly interested in the case where A is diagonal, i.e., we are multiplying each column in X by some number. But notice that this setup includes a lot of other redefinitions of variables.

ECON 835, Fall 202 5 7 An application The following are the results of an OLS regression using U.S. state-level data. The dependent variable is the state divorce rate (in percent) and the explanatory variables are the state urbanization rate (in percent) and a set of indicator variables for the state s region (north central, south, and west; the northeast is the base category). Variable Coefficient % urban -.0003509 (.00203867) North Central.048356 (.085872) South.454 (.08223) West.340245 (.084284) Intercept.430096 (.55765) Number of observations 50 R 2 0.3203 Standard errors are reported in parentheses, and are calculated under the assumption of homoskedasticity. The estimated covariance matrix for the coefficients is: % urban North Central South West Intercept % urban 4.56e-06.00002096.00003254 -.0000768 -.0002890 North Central.00002096.00737400.0043739.0040646 -.005634 South.00003254.0043739.00674424.0040494 -.0064645 West -.0000768.0040646.0040494.0070384 -.0029238 Intercept -.0002890 -.005634 -.0064645 -.0029238.02426266 a) Suppose we are willing to assume that divorce is normally distributed conditional on the explanatory variables. Perform a finite sample (t) test at the 5% level of significance of the null hypothesis that the coefficient on % urban is equal to zero. That is, state the null and alternative hypotheses, the critical values for the test, the value of the test statistic and the result (reject or do not reject) of the test. b) Perform a finite sample (F) test at the 5% level of significance of the joint null hypothesis that the coefficients on the region indicators (North central, South, and West) are all zero. That is, state the null and alternative hypotheses (using the Rβ r = 0 format, and defining what R and r are), the critical values for the test, the value of the test statistic and the result (reject or do not reject) of the test. c) Suppose we are not willing to assume normality. Perform an asymptotic test at the 5% level of significance of the null hypothesis that the coefficient on % urban is equal to zero. That is, state the null and alternative hypotheses, the critical values for the test, the value of the test statistic and the result (reject or do not reject) of the test. d) Perform an asymptotic (Wald) test at the 5% level of significance of the joint null hypothesis that the coefficients on the region indicators (North central, South, and West) are all zero. That is, state the null and alternative hypotheses (using the g(β) = 0 format, and defining what g(.) is), the critical values for the test, the value of the test statistic and the result (reject or do not reject) of the test. e) Suppose that the divorce rate and urbanization rate were measured in decimal instead of percent. What would be:

ECON 835, Fall 202 6. The coefficient on the urbanization rate, its standard error, and its t-statistic? 2. The coefficient on North Central, its standard error, and its t-statistic?