Econ 510 B. Brown Spring 2014 Final Exam Answers

Similar documents
Least Squares Estimation-Finite-Sample Properties

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Instrumental Variables, Simultaneous and Systems of Equations

Econometrics Summary Algebraic and Statistical Preliminaries

Iris Wang.

Answers to Problem Set #4

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Introductory Econometrics

Christopher Dougherty London School of Economics and Political Science

Econometrics of Panel Data

Heteroskedasticity and Autocorrelation

Dealing With Endogeneity

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Lecture 4: Heteroskedasticity

Econometrics Master in Business and Quantitative Methods

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

WISE International Masters

Multiple Regression Analysis: Heteroskedasticity

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Econometrics. 9) Heteroscedasticity and autocorrelation

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Heteroskedasticity. Part VII. Heteroskedasticity

Linear Regression. Junhui Qian. October 27, 2014

Econometrics - 30C00200

Multiple Regression Analysis

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

Econometrics Review questions for exam

Topic 7: Heteroskedasticity

The outline for Unit 3

FinQuiz Notes

Applied Quantitative Methods II

Introduction to Econometrics. Heteroskedasticity

Advanced Econometrics

the error term could vary over the observations, in ways that are related

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Empirical Economic Research, Part II

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Outline. 11. Time Series Analysis. Basic Regression. Differences between Time Series and Cross Section

Ch 2: Simple Linear Regression

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

F9 F10: Autocorrelation

Chapter 3 Multiple Regression Complete Example

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Econometrics Multiple Regression Analysis: Heteroskedasticity

Dynamic Panels. Chapter Introduction Autoregressive Model

Economics 308: Econometrics Professor Moody

Econ 583 Final Exam Fall 2008

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Diagnostics of Linear Regression

Homoskedasticity. Var (u X) = σ 2. (23)

Week 11 Heteroskedasticity and Autocorrelation

Lecture 6: Dynamic Models

Final Exam. 1. Definitions: Briefly Define each of the following terms as they relate to the material covered in class.

Econometrics Homework 1

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

Introductory Econometrics

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Linear dynamic panel data models

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Model Specification and Data Problems. Part VIII

Introductory Econometrics

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Econometrics Part Three

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

ECON 497: Lecture Notes 10 Page 1 of 1

ECON 4230 Intermediate Econometric Theory Exam

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

ECON 497: Lecture 4 Page 1 of 1

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Linear Regression with Time Series Data

Linear models and their mathematical foundations: Simple linear regression

Review of Econometrics

Introduction to Econometrics

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

388 Index Differencing test ,232 Distributed lags , 147 arithmetic lag.

Graduate Econometrics Lecture 4: Heteroskedasticity

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Chapter 8 Heteroskedasticity

2.1 Linear regression with matrices

Inference for Regression

Ordinary Least Squares Regression

Föreläsning /31

Econometrics of Panel Data

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

Simple and Multiple Linear Regression

The Simple Linear Regression Model

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Ma 3/103: Lecture 24 Linear Regression I: Estimation

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Linear Model Under General Variance

ECON3327: Financial Econometrics, Spring 2016

Introduction to Estimation Methods for Time Series models. Lecture 1

Transcription:

Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity is recommended. 1. Answer true or false and state why. a. When the disturbances are serially correlated and the regressors include a lagged endogenous variable, the OLS coefficient estimates will be biased and inconsistent. True. Suppose y t = α + βy t 1 + u t and u t = ρu t 1 + ε t,theny t 1 and u t both depend directly on u t 1 and will hence be correlated. a. Under the null hypothesis, the Lagrange multiplier (LM), likelihood ratio (LR), and Wald (W) tests always agree. True. They all have a limiting chi-squared distribution with degrees of freedom equal to the number of restrictions. b. When collinearity is known to be present, then OLS-base inferences are meaningless. False. Multicollinearity decreases the power of the test. The size is still correct and the test is still consistent in that it will reject a false null with probability one in large samples. c. A proper instrument for instrumental variables estimation is one which is uncorrelated with the regressors but correlated with the disturbances. False. A proper instrument should be correlated with the regressors but not the disturbances. d. When the RHS variables move smoothly,the appropriate critical value for the Durbin- Watson test is closer to d U than d L. True. The true critical value depends on the x-values used. When the x s move smoothly then the true value is close to d u. e. The power of the test is determined by variability of the explanatory variables. True. The power of a test (size of shift) is determined by the difference between the null and alternative, the sample size, and the signal-noise ratio. The signal is just the variability of the explanatory variables. f. If the disturbances of the regression are non-normally distributed then the usual estimated coefficient ratios will have a t-distribution. False. With known variance σ 2 the ratios will no longer be standard normal and with estimates s 2 they will no longer be t-distributed. g. When the model has right-hand side variables correlated with the disturbances, we can correct any difficulties through the use of generalized least squares. False. We should use instrumental variables. GLS is for nonscalar covariance matrices. 2. Consider the following multivariate regression model (for i =1, 2,..., n): y i = x 0 iβ + u i where x i is a kx1 vector of observations on the k independent variables for observation i, y i is the value of the dependent variable and β is the kx1 vector of unknown coefficients. The disturbance term satisfies the classical stochastic assumptions. Suppose we seek to predict the value of the dependent variable given by y = x 0 β + u for some observation outside the sample. We treat x as given for prediction purposes. a. For ŷ based on sample information, obtain the mean-squared error and show that x 0 β is the minimum mean-squared error predictor of y. Since E[y ]=x 0 β, wecanwrite (y ŷ ) 2 =[(y x 0 β)+(x 0 β ŷ )] 2 and MSE = E[(y ŷ ) 2 ]=σ 2 +E[(x 0 β ŷ ) 2 ] because u =(y x 0 β) uncorrelated with (x 0 β ŷ ) basedonthesample. Clearly 1

E[(x 0 β ŷ ) 2 ] is minimal when by = x 0 β. b. Determine the mean and variance of the least-squares based predictor ŷ = x 0 ˆβ and the prediction error y ŷ. Rewrite E[(x 0 β ŷ ) 2 ]=E[(x 0 (β b β) 2 ]=E[x 0 (β b β)(β bβ) 0 x ]=x 0 E[(β b β)(β b β) 0 ]x = σ 2 x 0 (X 0 X) 1 x is the variance of the predictor. And E[(y ŷ ) 2 ]=σ 2 [1 + x 0 (X 0 X) 1 x ] is the variance of the prediction error. c. Suppose u t i.i.d. N(0,σ 2 ), then what is the distribution of (y ŷ )/{σ 2 [1 + x 0 (X 0 X) 1 x ]} 1/2? From (b), we see that this ratio must be N(0, 1). d. What is the distribution of this statistic if we replace σ 2 by s 2? How might this statistic be used to obtain a prediction interval for y? Take the standard normal ratio from (c) and divide it by the independent chi-squared (n k)(s 2 /σ 2 ) divided by its degrees of freedom to obtain (y ŷ )/{s 2 [1 + x 0 (X 0 X) 1 x ]} 1/2 which will have a t distribution with n k degrees of freedom. We can use this ration to construct ŷ ± t.025,n k {σ 2 [1 + x 0 (X 0 X) 1 x ]} 1/2 as a 95% interval for y. 3. Consider the following model of demand and supply for laptop computers: Q i = α 1 + α 2 P i + α 3 Y i + u d i (Demand) and Q i = β 1 + β 2 P i + u s i (Supply) where Q i is the quantity sold/purchased, P i is the per unit price, and Y i is the income level for the customer involved in sale i. We assume that both error terms have the usual ideal properties. We consider Y i to be an exogenous variable. a. Show that the disturbance term for both equations is correlated with P i. What problems will this introduce for the OLS estimation of each equation? At equilibrium quantity demanded equals quantity supplied so α 1 + α 2 P i + α 3 Y i + u d i = β 1 + β 2 P i + u s i,which can be solved for P i to obtain the reduced form P i =(β 1 α 1 )/(α 2 β 2 ) (α 3 /(α 2 β 2 ))Y i +(1//(α 2 β 2 ))(u s i u d i ). Clearly P i is directly determined by and correlated with both u s i and u d i. b. With an appropriate choice of instruments, discuss the properties of instrumental variables estimates of the supply equation. Of the three observable variables Y i is the only exogenous variable available for use as an instument. Therefore, we use z 0 i =(1,Y i ) as instruments for the supply equation regressors x 0 i =(1,P i ) to obtain the instrumental variables estimator β e =(Z 0 X) 1 Z 0 y,wherey i = Q i. This estimator is biased but consisten and asymptotically unbiased and normal. c. Discuss how you might test whether OLS or IV estimates should be used for the supply equation. The null hypothesis is that P i is uncorrelated with u s i, while the alternative is that it is correlated. We compare β b 2, which is consistent and efficient under the null but inconsistent under the alternative, with β e 2, which is consistent under both the null and alternative but inefficient under the null. We divide ( β e 2 β b 2 ) by ((s.e.( β e 2 )) 2 (s.e.( β b 2 )) 2 ) 1/2, which should be asymptotically sandard normal under the null but in the tails with increased probability under the alternative. d. Discuss the feasibility of estimating the demand equation by instrumental variables. The regression vector for the demand equation is x 0 i = (1,P i,y i ) so if we use Y i as and instument for P i, the intrument vector z 0 i =(1,Y i,y i ) is linearly independent. We do not have enough instruments to estimate the demand equation by instrumental variables. e. Suppose we rewrite the supply equation as Q i = β 1 +β 2 P i +β 3 C i +u s i where C i measures 2

the per unit cost of production, then how would the IV estimation of the Demand equation be impacted? The reduced form now has P t as a function of both Y t and C t. Ifweassume C t is exogenous then we have two instuments available and can use instrumental variables of the demand equation with z 0 i =(1,Y i,c i ). 4. Consider the following multivariate regression model: y = X 1 β 1 + x k β k + u where X 1 is an n (k 1) matrix of observations of the first k 1 independent and x 2 is an n 1 vector of observation of the k th variable, y is the vector of observations on the dependent variable, and β 1 and β k are the corresponding coefficients. The disturbance term satisfies the ideal stochastic assumptions. a. Suppose that the last variable is collinear with the first k 1. What is meant by collinearity in this model? Near extreme collinearity? Collinearity occurs when one variable can be written as a linear combination of the remaining variables, x k = X 1 c for some c. Near extreme collinearity is when x k = X 1 c + v and v are small (in variability). b. Describe the properties, including possible optimality, of the OLS estimator of β k under near extreme collinearity. OLS is unbiased and BLUE in any case and BUE if the disturbances are normal. So they are the best unbiased estimates we can obtain. c. In the case of near extreme collinearity, describe the distribution of (ˆβ k β k )/(s 2 [(X 0 X) 1 ] kk ) 1/2 under both the null hypothesis that β k = β 0 k and the alternative β k = β 1 k. What happens to the power as the collinearity becomes extreme? Under the assumption of normality and nonstochastic regressors, this statistic will still have a t n k distribution under the null hypothesis. However, The variance of the estimator of the coefficient of the collinear term grows large so the shift term will be diminished and the power (shift under the alternative) will grow small. The signal-noise ratio will become unfavorable. d. How might one test for the possible collinearity discussed above. Be specific. The problem with collinearity is that the regression has a difficult time unraveling the separate effects of the collinear variables, although the overall combined effect may be quite clear. To detect this we test to see whether the possibly collinear variables are jointly insignificant. That is we conduct an F -test comparing the SSE for a regression including the variables with one that excludes the variables. e. Discuss possible solutions, if any, to the collinearity problem. Possible solutions include (i) finding additional observations that are either sufficiently numerous or different to aleviate the problem, (ii) combining with independent estimates (another way of adding observations), (iii) utilizing economic theory to essentially eliminate one or more of the collinear terms, and (iv) ridge regression which introduces bias as a way to reduce the high variances resulting from collinearity. 5. Consider the following cross-sectional model of income: y i = α + βx i + u i i =1, 2,..., n (i) where y i is income for individual i, x i is a variable that measures ability, say intelligence, and u i is an unobservable disturbance with is assumed to be N(0,σ 2 ). a. Assume the x i are nonstochastic and nonconstant, then what will be the distribution of the OLS estimates of α and β. Let β 0 =(α, β) and x 0 i =(1,x i ) then b β = β+(x 0 X) 1 X 0 u N(β,σ 2 (X 0 X) 1 ) 3

b. Alternatively, and more realistically, assume that x i are stochastic, what can be said about the distributions of the OLS estimates in small samples and large samples? Be explicit in any additional assumptions introduced. Assume (u i,x i ) jointly i.i.d., u i stochastically independent of x i, E[x i x 0 i] = Q p.d., then β X b N(β,σ 2 (X 0 X) 1 ) and n( β b β) d N(0,σ 2 Q 1 ). Suppose that x i is unobservable but a proxy x i, say I.Q., is observable but measures our ability variable with error x i = x i + ε i (ii) where ε i is the measurement error, which is uncorrelated with x i, and the model which is estimated on the basis of observable variables is y i = α + βx i + u i (iii) c. Show that x i will be correlated with u i. What problems will this cause for the OLS estimates and inferences of α and β. Solving (ii) for x i andsubstitutinginto(i)yields y i = α + βx i +(u i βε i ). Obviously, x i = x i + ε i is correlated with u i = u i βε i which means that the OLS estimates will be biased and inconsistent. d. Outline a procedure for estimating α and β which will avoid the difficulties introduced in (iii). Be explicit about any additional information, variables, or assumptions that are needed to carry out this procedure. We need to use instumental variables. An instrument for x i should be correlated with this variable but uncorrelated with u i. In this case we might use educational attainment as an instrument that is loosely correlated with intelligence but likely to be uncorrelated with the measurement error. Let zi 0 =(1,a i ) where a i is educational attainment, then the IV estimator is β e =(Z 0 X) 1 Z 0 y = β + (Z 0 X) 1 Z 0 u which will be biased but consistent and n( β e β) d N(0,P 1 MP 10 ) where we additionally assume (u i,x i,z i ) jointly i.i.d., u i uncorrelated with z i, E[z i x 0 i]= P nonsingular, and E[u 2 i z i zi]=m. 0 6. Suppose we mistakenly estimate the following model: y i = α + βx i + u i (i) when the true model has the form y i =(α + βx i ) exp(u i ) (ii) where x i is nonstochastic and nonconstant and u i i.i.d.(0,σ 2 ). a. Determine the mean and variance of u i in (i). (Hint: Remember that α + βx i in (i) is the expectation of y i and u i is the residual). Substiuting (ii) into u i = y i (α + βx i ) yields u i =(α + βx i ) exp(u i ) (α + βx i )=(α + βx i )(1 exp(u i )) = (α + βx i )w i. Now E[u i x i ]=(α+βx i )E[w x i ]=0if E[exp(u i ) x i ]=1and V [u i x i ]=(α+βx i ) 2 V [w x i ] which will be a function of x i generally even if u i is independent. b. What are the implications for the properties of the OLS estimates of (i) under this form of misspecification? The standard implications for heteroskedasticity: OLS will be unbiased and consistent but not BLUE. c. What are the implications for OLS-based inferences based on (i)? The usual OLS estimated covariance matrix will be inappropriate and the standard ratios will not have t-distributions. d. How might one test for the possibility of this form of misspecification? The White test is one approach. Regress e 2 i on (1,x i,x 2 i ) and test the null that the slope coefficients are jointly zero. This is a test of whether the standard OLS covariance matrix is correct 4

asymptotically. e. Show how the estimation/inference problem can be corrected or avoided. We can use feasible WLS by taking λ b 2 i = bα 0 + bα 1 x i + bα 2 x 2 i as the fitted values from the regression in (d). We now perform OLS on the transformed model y i / λ b i = α + βx i / λ b i + u i / λ b i. Alternatively we can calculate the heteroskedastic consistent covariance matrix. 7. Consider the aggregate production model: ln X t = β 0 + β 1 ln L t + β 2 ln K t + u t where X t is an index of U.S. GNP in constant dollars, L t is a labor input index, K t is a capital input index, and u t is a disturbance term. a. Making explicit your assumptions, briefly describe the properties of coefficient estimates for the model, given that the model does not change between samples. With the usual assumption (i)-(vi), we find that OLS will be unbiased, consistent, and BLUE. Under normality we also have BUE. b. A regression of this equation from annual data for 1929-1948 yields: ln X t = 4.0576 + 1.6167 ln L t +0.2197 ln K t (0.209) (2.289) with R 2 =.9759,s =.04573,d.f. =17. Construct a 95% confidence interval for the capital coefficient. A 95% interval is given by 0.2197±2.289 2.11 = ( 4, 6093, 5.0487). This represents the set of accepatable null hypothesis. c. A regression of the equation for the sample 1949-1967 yields: ln X t = 1.9564 + 0.8336 ln L t +0.6631 ln K t (0.2488) (0.2289) with R 2 =.9904,s=.02185,d.f.=16. Test the hypothesis that the variances σ 2 are the same in both samples (given the coefficients do not differ). Use the Goldfeld-Quandt test: s 2 1/s 2 2 =(.04579) 2 /(.02185) 2 =4.394 which should be a realization from an F 17,16. The 99% critical value for (15,15) is 3.52, so we reject the null hypothesis that the variance is thesameinthetwosubsamples. d. A regression over the complete sample 1929-1967 yields: ln X t = 3.8766 + 1.4106 ln L t +0.4162 ln K t (0.0884) (0.0505) with R 2 =.9937, s =.037555, d.f. = 36. Test the hypothesis that all the coefficients are the same in both samples. We take SSE 1 =17(.04573) 2 =.03555, SSE 2 = 16(.02185) 2 =.00763, andsse r = 36(.037555) 2 =.05077. The Chow test yields {[SSE r (SSE 1 +SSE 2 )]/3}/[(SSE 1 +SSE 2 )/33] = [(.05077.00763.03555)/3]/[(.00763+.03555)/33] =.00253/.00131 = 1.93 as a realization from the F 3,33. The 95% critical point for (3,30) is 2.92 so we are not in the tail and do not reject. e. What are the implications for the OLS estimates (iii) if we reject the hypothesis tested in (c)? We have heteroskedasticity and the OLS estimated standard errors will be biased and inconsistent with the direction indeterminate. Our ratios will either be too large or too small. If we reject the hypothesis tested in (d)? The coefficients estimates for the combined sample will be an average of the two separate samples and be inconsistent for either. 5