Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Similar documents
Økonomisk Kandidateksamen 2004 (I) Econometrics 2

Økonomisk Kandidateksamen 2004 (II) Econometrics 2 June 14, 2004

Økonomisk Kandidateksamen 2005(I) Econometrics 2 January 20, 2005

Eksamen på Økonomistudiet 2006-II Econometrics 2 June 9, 2006

Linear Regression with Time Series Data

Econometrics of Panel Data

Empirical Economic Research, Part II

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Review of Econometrics

Econometrics of Panel Data

Linear Regression with Time Series Data

Advanced Econometrics

Multivariate Regression Analysis

Econometrics of Panel Data

Econometrics of Panel Data

Econ 510 B. Brown Spring 2014 Final Exam Answers

Linear Regression with Time Series Data

Diagnostics of Linear Regression

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

F9 F10: Autocorrelation

Christopher Dougherty London School of Economics and Political Science

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs

Empirical Market Microstructure Analysis (EMMA)

Econometrics Master in Business and Quantitative Methods

Non-Stationary Time Series and Unit Root Testing

Non-Stationary Time Series and Unit Root Testing

Non-Stationary Time Series and Unit Root Testing

Econometrics Summary Algebraic and Statistical Preliminaries

Nonstationary Panels

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

ECON The Simple Regression Model

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Econometrics of Panel Data

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Reliability of inference (1 of 2 lectures)

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

Iris Wang.

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

Introductory Econometrics

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Econometrics II - EXAM Answer each question in separate sheets in three hours

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECON 4230 Intermediate Econometric Theory Exam

Applied Microeconometrics (L5): Panel Data-Basics

Economics 582 Random Effects Estimation

Homoskedasticity. Var (u X) = σ 2. (23)

Regression with time series

Lecture 4: Heteroskedasticity

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Lecture 6: Dynamic panel models 1

Introduction to Estimation Methods for Time Series models. Lecture 1

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Answers to Problem Set #4

7 Introduction to Time Series Time Series vs. Cross-Sectional Data Detrending Time Series... 15

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Lecture 7: Dynamic panel models 2

Econometrics of Panel Data

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Simple Linear Regression

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

Cointegration, Stationarity and Error Correction Models.

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Motivation for multiple regression

7 Introduction to Time Series

Introductory Econometrics

Chapter 2: simple regression model

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

MEI Exam Review. June 7, 2002

The regression model with one fixed regressor cont d

The regression model with one stochastic regressor (part II)

Applied Econometrics. Applied Econometrics. Applied Econometrics. Applied Econometrics. What is Autocorrelation. Applied Econometrics

Heteroskedasticity and Autocorrelation

Lecture 3: Multiple Regression

Introduction to Eco n o m et rics

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

L2: Two-variable regression model

Introduction to Time Series Analysis of Macroeconomic- and Financial-Data. Lecture 2: Testing & Dependence over time

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Dealing With Endogeneity

Lecture 6: Dynamic Models

FinQuiz Notes

11.1 Gujarati(2003): Chapter 12

Linear dynamic panel data models

Linear Regression. Junhui Qian. October 27, 2014

Non-Stationary Time Series, Cointegration, and Spurious Regression

LECTURE 11. Introduction to Econometrics. Autocorrelation

Economics 308: Econometrics Professor Moody

WISE International Masters

Ch.10 Autocorrelated Disturbances (June 15, 2016)

Applied Quantitative Methods II

Instrumental Variables

Environmental Econometrics

Statistics 910, #5 1. Regression Methods

Transcription:

Økonomisk Kandidateksamen 2004 (I) Econometrics 2 Rettevejledning This is a closed-book exam (uden hjælpemidler). Answer all questions! The group of questions 1 to 4 have equal weight. Within each group, part (a) represents very basic questions, part (b) questions which require somewhat more detailed knowledge of the curriculum, and part (c) questions which require a deeper understanding, for example they may be technically demanding or they may require a good understanding for how to combine different theoretical results. A correct answer of all part (a) questions is sufficient for passing the exam. The answers can be in Danish or English. Question 1. (a) For the linear model with a fixed effect: y = α + βx + α + u it it i it explain informally why an OLS pooled regression of yit on x it may lead to inconsistent estimates of β. Suggest an alternative estimation procedure that does give consistent estimates. (b) Explain what we mean by a difference-in-differences estimator. (c) Three different investigators have the same panel data set and the same linear model. They use different estimation methods: OLS, random effects and first differencing. The three sets of parameter estimates are quite different and each investigator claims that their estimates are the correct ones. How would you devise an objective test between them? Answers for question 1. (a). If the fixed effect, α i, is correlated with the variable x it then we have omitted variable bias. That is, we have left out a variable (in this case the fixed effect) which is correlated with an included variable. This makes the included variable x it correlated with the composite error term ( α i + uit) ( endogeneity ) and hence leads to inconsistent estimates. There are two usual estimators that give consistent estimators. The first is to first y x (without including a constant) difference, so that we regress ( it yi, t 1) on ( it xi, t 1)

to estimate β. Alternatively we use the within estimator which take the means over time from each variable to give ( yit yi ) and ( xit xi ) and then regresses the former on the latter (again without a constant). Including constants in these regressions does not cause any harm. (b). Suppose we have samples from the same population in two different time periods, 1 and 2. Some members of the population are treated at the end of period 1 and before period 2 and we observe which people have been treated. Examples of a treatment include a policy change that affects some people but not others and a change in the local environment that affects people there but not people who live elsewhere. A difference-indifference estimator for the effect of the treatment on a variable of interest is to compare the difference between the two groups before and after the treatment. Formally, let Y i be the variable of interest for person i and let dbi be a dummy variable that is 1 if person i is treated (and zero otherwise). Let d 2 i be a dummy variable that is one if person i was sampled in period 2. Then run the regression: ( ) Y = β + δ d2 + β db + δ d2 * db + u i 0 0 i 1 i 1 i i i The OLS estimate of δ 1 gives the estimate of the difference-in-difference. A test of whether this coefficient is significantly different from zero is a test of whether the treatment had an effect on the variable Y. (c). The fact that the three sets of estimates differ suggests that the least restrictive model, the first differenced ( fixed effect ) model, is the correct one. This follows since the random effects model and the fixed effects model should give consistent estimates (and hence similar estimates) if the assumptions for the former are correct. However, the differences between the three sets of estimates may not be statistically significant. One way to organise a test is to use the following version of the random effects estimator: ( y ) ( 1 ) ( )... it λyi = β0 λ + β1 xit1 λxi 1 + βk ( xitk λxik ) + ( uit λu ) i where λ is a parameter that is equal to zero if the pooled OLS estimator is appropriate and equal to unity if the fixed effect estimator should be used. For the random effects model we have 0< λ < 1. To estimate λ we can take first round estimates of using a pooled OLS estimation and then find estimates of the variances of the fixed effects and the residuals.

Question 2. We consider the linear model, y t = β 0 x t + u t, x 0 t = [1,x 1,t,x 2,t ]. The variables are defined by y t = UnR t, theunemploymentrate(thenumberof unemployed per labor force), x 1,t = LrC t, the log of real aggregate consumption, x 2,t = Rb t, the 10 year bond rate, and t = 1983:2-2003:1. The output from an OLS regression analysis, inclusive various misspecification tests, is reported in Appendix, Part A. Question 2(a): (a) Under which conditions is the OLS estimate ˆβ unbiased and efficient? Based on the misspecification tests reported in Appendix, Part A do you think the conditions are satisfied. Motivate your answer. Assume that you know from previous experience that LrC t = a 0 +a 1 UnR t +...+e t. How would this affect your evaluation of the OLS properties of the present model? Answer 2(a): The OLS estimate ˆβ is unbiased under the conditions E(u t )=0and E(x t u t )=0. It is efficient if, additionally, u t is independent and identically distributed. The misspecification tests in Part A show that the residuals are significantly autocorrelated and heteroscedastic. Thus, the OLS estimate is not efficient. From the misspecification tests in Part A it is not possible to know whether the condition E(x t u t )=0is satisfied or not. However, we know that UnR t = f(lrcr t,u t ) and that LrCr t = g(unr t ), so that one of the explanatory variables is explained by the dependent variable. This implies that: LrCr t = g(unr t )=g(f(lrcr t,u t )), and we can conclude that one of the explanatory variables and the error term are correlated, i.e. E(x t u t ) 6= 0. HencethattheOLSestimateisbiased. Question 2(b): (b) Three different estimates of the standard errors of regression estimates (SE, HACSE, HCSE) are reported in Appendix, Part A. Explain briefly the difference between the three estimates. Based on the reported misspecification tests discuss which of the three is the most appropriate. Does any of the three standard errors of estimates correct for simultaneity bias? Motivate briefly! 1

Answer 2(b): The SE stands for standard errors of OLS regression coefficients and is an unbiased q and efficient estimate of the standard deviation of the estimation error, Var(ˆβ), if the error term in the model is independently and identically distributed, u t iid(0,σ 2 u). TheHCSEistheestimatedOLSstandarderror corrected for heteroscedasticity in the error term without assuming a specific model for the error heteroscedasticity. The HACSE is the estimated standard error corrected for both autocorrelation and heteroscedasticity in the error term without assuming a specific model for the autocorrelated errors or the error heteroscedasticity. There was evidence of both significant heteroscedasticity and autocorrelation in the estimated model. Thus, the HACSE are the most relevant to use in this case. However, we note that SE and HCSE are fairly close to each other indicating that the heteroscedasticity problem is less serious than the autocorrelation problem. This could also be seen from the calculated test statistics which were much more significant inthecaseofautocorrelation. None of the three standard errors correct for simultaneity bias. This is because simultaneity bias is due to E(ˆβ) 6= β, i.e. hastodowiththemean of the estimated regression coefficient and not with the variance. If there is simultaneity bias we need to choose a different estimation method, for example instrumental variable estimation (or estimate a system). Question 2(c): (c) A test for ARCH errors is reported in Appendix A. Formulate the H 0 and the H 1 hypothesis. Give the conditions under which this test is valid. Are they satisfied in this case? Motivate briefly! Answer 2(c): The linear model: y t = β 0 x t + u t, where E(u 2 t )= + γ 1 u 2 t 1 + γ 2 u 2 t 2 If we express =(1 γ 1 γ 2 )σ 2 ε it is easy to see that γ 1 = γ 2 =0corresponds to a constant variance. The null and the alternative hypothesis: 2

H 0 : γ 1 = γ 2 =0, i.e. u t iid(0,σ 2 ε) (the errors are independently and identically distributed) H 1 : γ 1 6=0and/or γ 2 6=0, i.e. u t id(0,σ 2 u,t) (the errors are independently, but not identically distributed) Under the null hypothesis the errors should be independently and identically distributed. The ARCH test is designed to check the assumption of identically distributed assuming that the assumption of independence is correct. Because the independence assumption is clearly not satisfied in this case (as appears from the LM misspecification test) the ARCH test in Part A should be taken with caution. The reason is that the χ 2 test is based on squared independent normal variables, which is not satisfied in this case. Question 3(a): (a) Appendix, Part B reports the output of a dynamic version of the abovestaticolsmodel y t = a 1 y t 1 + b 01 x 1,t + b 11 x 1,t 1 + b 02 x 2,t + b 12 x 2,t 1 + a 0 + ε t t = 1983:2 2003:1. Discuss whether you think the choice of one lag is consistent with the reported misspecification tests in Appendix, Part A. Compare with the same misspecification tests in Part B and discuss whether the OLS assumptions are now satisfied? Explain the difference between a residual correlogram, a residual autoregression and a partial autocorrelation function. Explain why R 2 and R 2 relative to difference give two very different results. Answer 3(a): The partial autocorrelation function shows that only the first coefficient is very large, whereas the remaining coefficients are small considering that ˆσ ρ =1/ T 0.12. This is confirmed by the LM test where only the first coefficient is significant in the auxiliary regression. Thus, the choice of one lag should be sufficient. However, when comparing with the misspecification tests in Part B it turns out that there is still autocorrelation left (probably first order) in the residuals. This means that the OLS assumptions are not satisfied for the dynamic model and the estimates are both inefficient and biased because E(y t 1 ε t ) 6= 0. Thus, an explantory variable is correlated with the error term. 3

A residual correlogram of order m consists of autocorrelation coefficients, r i,i=1,..,m where r i is defined as u t = r i u t i + e t, i.e. it is a simple correlation between u t and u t i. A residual autoregression of order m is a multiple regression of lagged residuals as defined by: u t = ρ 1 u t 1 +... + ρ m u t m +e t. A partial autocorrelation function consists of the last coefficient ρ m in a residual autoregression model u t = ρ 1 u t 1 +... + ρ m u t m + e t, for m =1, 2, 3... R 2 is defined as 1 RSS/T SS where RSS = P ˆε t 2 and TSS = P (y t ȳ) 2, whereas R 2 relative to difference is derived by using TSS = P (y t y t 1 ) 2, i.e. in the firstcasetheaveragevalueȳ is used as yardstick against which the explanatory power of the model is compared, whereas in the second case it is the lagged value of the process. The R 2 is appropriate when the variable y t is stationary (and not strongly autocorrelated), whereas R 2 relative to difference is appropriate when y t is nonstationary (or stationary but strongly autocorrelated). Question 3(b): Derive the static long-run solution, y = β 0 + β 1 x 1 + β 2 x 2, of the estimated dynamic regression model based on the analysis of lag structure in Appendix, Part B (only approximate coefficients are needed). Interpret the estimated long-run coefficient of real consumption and compare it to the static regression estimate in Part A. Does the long-run estimate fall within the 95% confidence interval of the static OLS regression coefficient? Which ofthetworesultsdoyouthinkismorereliable? Answer 3(b): The long-run steady-state solution is: β 0 =0.379/0.0486 = 7.7 β 1 = 0.0586/0.0486 = 1.2 β 2 = 0.568/0.0486 = 11.7 None of the coefficients fall within the 95% confidence intervals. As the misspecification tests showed both the static and the dynamic regression model showed clear evidence of being misspecified. The degree of residual autocorrelation was higher in the static model than in the dynamic model, hence the results might be more reliable from the dynamic model. Since the OLS estimates are biased in the dynamic regression model (because the 4

residuals were autocorrelated) one should be cautious when interpreting the derived coefficients. It is obvious that the model is not yet well-specified (possibly because parameters have not been stable over the sample period). Question 3(c): (c) Assume that inflation rate is an important omitted explanatory variable to the unemployment rate. Under which conditions would the estimates of the static regression coefficients in Part A and the estimates of the solved long-run coefficients in Part B remain unchanged? Discuss the role of the ceteris paribus assumptions in an economic model and the omitted variables problem in an empirical model. Answer 3(c): Only when inflation is uncorrelated with real consumption and the bond rate, will the coefficient estimates in the static regression model of Part A remain the same. This is because if we exclude a relevant variable which is correlated with the regressors in the model, the coefficients of the estimated model will not only be biased, but also inconsistent. This can be shown as follows. We assume that the data is generated by the model y = xβ x.z + zβ z.x + ε (1) where β x.z = β is the true parameter measuring the effect of a change in x t on y t, when we have accounted for all relevant effects (i.e. given the ceteris paribus clause everything else equal ). Assume that we instead estimate the model: y = xβ x + u. (2) TheOLSestimateofβ x is: We substitute the true value of y from (1) in 3: ˆβ x =(x 0 x) 1 x 0 y. (3) ˆβ x = (x 0 x) 1 x 0 (xβ x.z + zβ z.x + ε) E(ˆβ x ) = β x.z + E (x 0 x) 1 x 0 z ª β {z z.x + E (x 0 x) 1 x 0 ε ª } Omitted variable bias 5

It is now easy to see that ˆβ x = β x.z only if E(x 0 z)=0. That is, if we exclude relevant regressors z t (relevant means that β z.x 6=0), which are correlated with the regressors x t (such that E (z t x t ) 6= 0)then the OLS estimator will suffer from omitted variable bias. While the ceteris paribus assumption in a theoretical model allows us to discuss a specific value of a parameter in a model with a few key variables, the value of the parameter is only well-defined empirically when the ceteris paribus variables are uncorrelated with the variables included in the model. The long-run coefficients, (β 1,β 2 ), are nonlinear combinations of the original coefficients in the dynamic regression model. Therefore, if we include a new relevant variable, x 3,t, as well as its significant lags, what matters is whether Cov(ˆβ 1ˆβ3 )=0and Cov(ˆβ 2ˆβ3 )=0. Only if this is the case will the previous estimates (ˆβ 1, ˆβ 2 ) remain unchanged. Thus whether the estimated long-run coefficients (ˆβ 1, ˆβ 2 ) willremainthesameornotdependsonthe covariances between the long-run coefficients (ˆβ 1, ˆβ 2, ˆβ 3 ). Question 4(a): In the linear model the design matrix (X 0 T X T ) plays an important role, where X T =[x 1,..,x k ] and x i is (T 1). IntheOLSmodelweoftenassume that the x variables are fixed or given. When this assumption is inappropriate, asymptotic theory is often used to derive properties of estimators and test procedures. Under which condition will 1 T (X0 T X T ) T M, where M is a matrix with constant parameters? Explain how the Dickey-Fuller test of a unit root is formulated and specify the H 0 and the H 1 hypothesis of a unit root in the variable x t when the latter contains a linear trend. Based on the enclosed output would you say there is a unit root in unemployment rate? Asymptotic critical 5% values for unit root tests: τ nc = 1.94, τ c = 2.86, τ ct = 3.41, wherenc stands for no constant, c for constant and ct for constant and trend in the model. Answer 4(a): The condition 1 T (X0 T X T ) T M will be satisfied when the variables x t,i are stationary, i.e. have constant means and constant variances and covariances. If the explanatory variables are nonstationary the condition will not be satisfied. 6

Under the assumption that there is a constant and a linear trend in the variable x t we specify the following model: x t = cx t 1 + b 0 + b 1 t + ε t The null and the alternative hypotheses of a unit root are: H 0 : c =0and b 1 =0 H 1 : c 6= 0and b 1 6=0. However, the Dickey-Fuller critical values are derived for the simple test of H 0 : c =0. Whentestingforaunitrootinunemploymentratewedonotassumea unit root, as it generally would not make sense. (Note, that students have not been penalized for choosing a trend in the model.). In this case, under assumption that b 1 =0, we test the following hypotheses: H 0 : c =0and b 0 =0 H 1 : c 6= 0and b 0 6=0. The D-F test statistic: 2.96 > 2.86 reject unit root in unemployment Note, however, that the Dickey-Fuller test procedure is derived under the assumption that the parameters in the Dickey-Fuller regression have constant parameters. This may not be the case in practice. Question 4(b): (b) Specify the H 0 and the H 1 hypothesis of the common factor hypothesis in the dynamic regression model in Question 3. Based on the result of the COMFAC test reported in Part B would it be appropriate to estimate the static regression model in Question 2 with Generalized Least Squares? Motivate your answer. Answer 4(b): The static regression model with autocorrelated errors is: where u t = ρu t 1 + ε t. This leads to: y t = β 0 + β 1 x 1,t + β 2 x 2,t + u t y t = β 0 (1 ρ)+β 1 x 1,t ρβ 1 x 1,t 1 + β 2 x 2,t ρβ 2 x 2,t 1 + u t = a 0 + a 1 y t 1 + b 01 x 1,t + b 11 x 1,t 1 + b 02 x 2,t + b 12 x 2,t 1 + ε t 7

The null hypothesis of a common factor restriction is: and the alternative: H 0 : b 11 = ρβ 1,b 12 = ρβ 2 H 1 : at least one of the equalities does not hold. The Comfac Wald test statistic 7, 95 corresponds to a p-value of 0.0007, implying that there is less than 1% possibility that the H 0 hypothesis can be true. This means that the residuals, even if autocorrelated do not follow an AR(1) process. Therefore, GLS would be bad idea, as it would impose such an error process on the residuals. Question 4(c): Discuss a Wald procedure for testing the following hypotheses in the above dynamic regression model. H 0 : β 1 = b 01+b 11 a 1 =0, against the alternative H 1 : β 1 = b 01+b 11 a 1 6=0. Answer 4(c): Since this is a nonlinear hypothesis on the parameters θ 0 1 =[a 1,b 01,b 11,b 02,b 12,a 0 ], we have to find a linear approximation: ³ r 1 (θ 1 ) ' R 1 θ 1 = r b 1(θ 1 ) 01 +b 11 θ 1 = θ 1. θ 1 θ 1 The vector R 0 1 becomes: R 0 1 = b01 +b ³ 11 a 1 = b 01+b 11 ( ) ³ 2 b01 +b 11 b 01 = 1 b01 +b ³ 11 b 11 = 1 b01 +b ³ 11 b 02 =0 ³ b01 +b 11 b 12 =0 ³ b01 +b 11 a 0 =0 8.

We can now find the Wald test statistic (??): ξ W = ˆβ 1 ( R 1 ˆV R 0 1) 1ˆβ1 χ 2 (1). where ˆV =σ 2 ε(x 0 X) 1 and X =[y 1, x 1, x 1, 1 x 2, x 2, 1, 1]. 9