Answers to Problem Set #4

Similar documents
CHAPTER 6: SPECIFICATION VARIABLES

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

The Simple Regression Model. Part II. The Simple Regression Model

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Brief Suggested Solutions

3. Linear Regression With a Single Regressor

2. Linear regression with multiple regressors

4. Nonlinear regression functions

Model Specification and Data Problems. Part VIII

Practice Questions for the Final Exam. Theoretical Part

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Heteroskedasticity. Part VII. Heteroskedasticity

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

10. Time series regression and forecasting

Exercise Sheet 6: Solutions

Statistical Inference. Part IV. Statistical Inference

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

6. Assessing studies based on multiple regression

x = 1 n (x = 1 (x n 1 ι(ι ι) 1 ι x) (x ι(ι ι) 1 ι x) = 1

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Econ 510 B. Brown Spring 2014 Final Exam Answers

Multiple Regression Analysis

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Empirical Economic Research, Part II

Ch 2: Simple Linear Regression

Exercise Sheet 5: Solutions

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Econometrics Review questions for exam

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

Föreläsning /31

AGEC 621 Lecture 16 David Bessler

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

11. Simultaneous-Equation Models

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Outline. 11. Time Series Analysis. Basic Regression. Differences between Time Series and Cross Section

Eastern Mediterranean University Department of Economics ECON 503: ECONOMETRICS I. M. Balcilar. Midterm Exam Fall 2007, 11 December 2007.

Brief Sketch of Solutions: Tutorial 3. 3) unit root tests

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

The Multiple Regression Model Estimation

Least Squares Estimation-Finite-Sample Properties

ECON 3150/4150, Spring term Lecture 7

Heteroscedasticity 1

Applied Statistics and Econometrics

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECO220Y Simple Regression: Testing the Slope

Problem set 1: answers. April 6, 2018

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECON 366: ECONOMETRICS II. SPRING TERM 2005: LAB EXERCISE #10 Nonspherical Errors Continued. Brief Suggested Solutions

Econometrics of Panel Data

ECON The Simple Regression Model

Lecture 8. Using the CLR Model

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

ECON 4230 Intermediate Econometric Theory Exam

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

APPLIED MACROECONOMETRICS Licenciatura Universidade Nova de Lisboa Faculdade de Economia. FINAL EXAM JUNE 3, 2004 Starts at 14:00 Ends at 16:30

Statistics and Quantitative Analysis U4320

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

7. Integrated Processes

WISE International Masters

Lecture 4: Multivariate Regression, Part 2

Exercise sheet 6 Models with endogenous explanatory variables

Multiple Regression Analysis

Econometrics - Slides

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Applied Econometrics (QEM)

About the seasonal effects on the potential liquid consumption

Solution to Exercise E6.

Lecture 4: Heteroskedasticity

Statistical Inference with Regression Analysis

7. Integrated Processes

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Econometrics of Panel Data

Lecture 6 Multiple Linear Regression, cont.

Econ 427, Spring Problem Set 3 suggested answers (with minor corrections) Ch 6. Problems and Complements:

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

[y i α βx i ] 2 (2) Q = i=1

Econometrics of Panel Data

Applied Econometrics (QEM)

Introductory Econometrics

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Problem Set 2: Box-Jenkins methodology

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Simple and Multiple Linear Regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Multivariate Regression Analysis

Applied Statistics and Econometrics

An Introduction to Mplus and Path Analysis

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

1 Quantitative Techniques in Practice

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias

Hint: The following equation converts Celsius to Fahrenheit: F = C where C = degrees Celsius F = degrees Fahrenheit

Wednesday, October 17 Handout: Hypothesis Testing and the Wald Test

Applied Quantitative Methods II

Transcription:

Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2 3 and covariance matrix bβ β2 b β3 b bβ 3 2 bβ 2 2 4 0 bβ 3 0 3 Test each of the following hypotheses and state the conclusion: a. β 2 =0 t = ˆβ 2 β 2 = 3 0 = 3 ˆσ β2 4 2 =.5 Since the critical t-value is approximately.67, fail to reject the null hypothesis.. b. β +2β 2 =5 Restate the null in terms of a new parameter, γ =5 β 2β 2 : H o : γ =0 The t-statistic is: where t = ˆγˆσ γ = 3 0.905 ˆγ =5 ˆβ 2ˆβ 2 =5 2 2 3= 3 ˆσ 2 γ = V ³5 ˆβ 2ˆβ 2 = V ˆσ 2 γ = 3+4(4)+4( 2) = ˆσ γ = ³ˆβ +4V ³ˆβ2 +2( ) ( 2) cov ³ˆβ, 2 ˆβ Since the t-statistic is less than the critical t-value (approximately.67) we fail to reject the null hypothesis.. c. β β 2 + β 3 =4

Again, restate the null in terms of a new parameter, γ =4 β + β 2 β 3 : H o : γ =0 The t-statistic is: t = ˆγˆσ = 6 γ 4 =.5 where ˆγ =4 ˆβ + ˆβ 2 ˆβ 3 =4 2+3+=6 ˆσ 2 γ = V ³4 ˆβ + ˆβ 2 3 ˆβ ³ˆβ ³ˆβ2 ³ˆβ3 = V + V + V +2 ( ) () cov ³ˆβ, 2 ˆβ +2( )( )cov ³ˆβ, 3 ˆβ + 2()( )cov ³ˆβ2, 3 ˆβ ˆσ 2 γ = 3+4+3+( 2)( 2) + 2() + ( 2)(0) = 6 ˆσ γ = 6 = 4 Since the t-statistic is less than the critical t-value (approximately.67) we fail to reject the null hypothesis. 2. Suppose that, from a sample of 00 observations, you run a regression of the dependent variable on 3 independent variables or regressors. Some of the output from this regression looks as follows: S.E.of regression: 2.532 Varianceof y : 4.326 Conduct the F-test for the overall signiþcance of the coefficients. The test has the following null and alternative hypotheses: H o : β = β 2 = β 3 =0 H A : β j 6=0, for at least one j =, 2, 3 First, Þnd the sum of sqaured residuals: s PT s = ε2 t (T K) s 2 = s 2 (T K) = ε2 t (T K) ε 2 t ε 2 t = (2.532) 2 (00 3) = 62.92

Find the (total sum of sqaures) sum of squared y t s: var(y) = (y t ȳ) 2 T 4.326 = (y t ȳ) 2 99 (y t ȳ) 2 = 428.274 Note, the R 2 for this regression is: R 2 = ε2 t 62.92 (y = 2 t ȳ) 428.27 = 0.452 The R 2 by deþnition has to be bounded by 0 and. If the R 2 had turned out positive, you can use the following relationship between the R 2 and the F-statistic for an overall signiþcance test: R 2 /(K ) F = ( R 2 ) /(T K) 3. Show that: R 2 = R 2 k T k ( R2 ) The adjusted R 2 and R 2 are deþned as: R 2 = R 2 = ³ PT ε2 t ³ PT /(T K) (y t ȳ) 2 /(T ) ε2 t (y t ȳ) 2

Rewrite R 2 : R 2 = R 2 (T ) (T K) (T ) (T ) = + R2 (T K) (T K) (T K T +) 2 (T ) = + R (T K) (T K) ( K) (T ) = + R2 (T K) (T K) 2 (T ) (K ) = R (T K) (T K) µ T K + K = R 2 (K ) T K (T K) µ = R 2 + K (K ) T K (T K) = R 2 + R 2 K T K (K ) (T K) = R2 ( K) R 2 (T K) 4. Given the following regression model y t = β + β 2 x 2t + β 3 x 3t + ε t you are asked to test the null hypothesis, H 0 : β 2 +3β 3 =. (a) Construct the restricted regression that imposes the constraint implied by the null hypothesis and discuss how you would use an F-test to test this null hypothesis. Restricted model: y t = β +( 3β 3 ) x 2t + β 3 x 3t + ε t y t = β + x 2t + β 3 (x 3t 2x 2t )+ε t y t x 2t = β + β 3 (x 3t 2x 2t )+ε t Construct two new variables: ỹ t =(y t x 2t )and x t =(x 3t 2x 2t ). Run the following two regressions: (U) : y t = β + β 2 x 2t + β 3 x 3t + ε t (R) : ỹ t = β + β 3 ( x t )+ε t Obtain the sum of squared residuals and construct the F statistic: F = (SSE R SSE U ) /(3 2) SSE U /(T 3) = (SSE R SSE U ) SSE U /(T 3)

If the F statistic is greater than the critical F,T 3 then we reject the null hypothesis that β 2 +3β 3 =.. b. Suppose instead that you are asked to test this null hypothesis using a t-statistic. Describe how you would construct the test in this case. You need not write down \ the exact OLS expressions for quantities such as COV ( β b i, β b j ). Remember that V (ax +by )=a 2 V (X)+b 2 V (Y )+2abCOV (X, Y ), and that out of the estimation you will obtain the variances and covariances of all the coefficient estimates. Restate the null in terms of a new parameter, γ = β 2 3β 3 : The t-statistic is: where H o : γ =0 t = ˆγˆσ γ ˆγ = β 2 3β 3 ³ˆβ2 ³ˆβ3 ˆσ 2 \ ³ˆβ2 3 γ = V ( β 2 3β 3 )=V + V +2( )( 3) cov, ˆβ r ³ˆβ2 ³ˆβ3 \ ³ˆβ2 3 ˆσ γ = V + V +2( )( 3) cov, ˆβ Calculate the t-statistic and compare to the critical t-value. If the t-statistic exceeds the critical value, reject the null hypothesis. Another way to solve this problem is impose the restriction in the regression model: Y t = β +(γ 3β 3 +)X 2t + β 3 X 3t + ε t Y t = β + γx 2t + β 3 (X 3t 3X 2t )+X 2t + ε t Y t X 2t = β + γx 2t + β 3 (X 3t 3X 2t )+ε t You can estimate this expression using least squares and test the hypothesis that γ =0. 5. Suppose you estimate by OLS the model However, the true model is given by the expression y t = βx t + ε t () y t = βx t + γz t + u t (2) where u t is the residual term, and z t is a regressor that was omitted in the Þrst regression. Furthermore, suppose that γ < 0andthatCOV (x, z) > 0

(a) Ignoring that γ < 0andthatCOV (x, z) > 0, in general, under what conditions will the estimator b β be an unbiased estimator of the true β? Unbiasedness requires that E(x t,u t )=0. ˆβ = = (y t ȳ)(x t x) (x t x) 2 (βx t + γz t + u t β x γ z)(x t x) (x t x) 2 Ignoring that γ < 0andthatCOV (x, z) > 0: ˆβ = (βx t + u t βx t )(x t x) (x t x) 2 ˆβ = β (x P t x)(x t x) T (x + u t (x t x) t x) 2 (x t x) 2 ˆβ = β + u t (x t x) (x t x) 2 Taking the expected value, conditioning on x t : " ³ˆβ xt PT # E = E(β x t )+E u t (x t x) (x t x) 2 x t = β + = β + "Ã! # (x t x) E u 2 t x t x t "Ã! # (x t x) E u 2 t x t x t 0 (x t x) 2 E "à x! # u t x t Using the Law of Iterated Expectations: Ã! E ³ˆβ = β + (x t x) E u 2 t x t ³ PT From the above expression, we can see that E ³ˆβ 6= β if E u tx t 6= 0.. b. Calculate the bias in b β when we estimate by OLS equation () (i.e., we omit z t ).

Calculate ˆβ : ˆβ = ˆβ = (y t ȳ)(x t x) (x t x) 2 (βx t + γz t + u t β x γ z)(x t x) (x t x) 2 ˆβ = β (x t x) 2 (x t x) + γ (z P t z)(x t x) T 2 (x + u t (x t x) t x) 2 (x t x) 2 Taking the expected value, conditioning on x t : à ³ˆβ xt! # E = E(β x t )+ "γ (x t x) E (z 2 t z)(x t x) x t + "Ã! #! # (x t x) E u 2 t x t x t "à x (x t x) E u 2 t x t Assuming E(x t,u t )=0holds: ³ˆβ xt E E ³ˆβ = β + = β + "Ã! # γ (x t x) E (z 2 t z)(x t x) x t "!# à γ (x t x) E (z 2 t z)(x t x) {z } BIAS. c. Given the conditions of the problem, is b β from the previous part biased upwards or downwards? Explain why. Recall that γ < 0andcov(x t,z t ) > 0: E ³ˆβ E ³ˆβ Ã! Ã! γ = β + (x t x) 2 E (z t z)(x t x) {z } {z } = β + negative bias ( ) (+) Therefore, the estimate of ˆβ will be biased downward.

Comments on Problem Set #4 (Empirical) There are a number of ways to approach this problem. I have provided some results below, but they are by no means exhaustive. As a first step, include all of the explanatory variables: DRUGS, ENROLLMENT, MATH87, and SES in a baseline regression. Note, the variable ENROLL is ENROLLMENT/000, so the coefficient can be interpreted for each 000 students. Dependent Variable: MATH9 Method: Least Squares Date: 03/0/0 Time: 6:07 Sample: 407 Included observations: 402 Excluded observations: 5 Variable Coefficient Std. Error t-statistic Prob. C 0.02374 0.083436 0.48309 0.8822 DRUGS -0.96503 0.436590-2.099230 0.0364 ENROLL 0.067648 0.07762 0.876704 0.382 MATH87 0.6467 0.037062 7.29973 0.0000 SES 0.33049 0.03932 3.383634 0.0008 URBAN 0.000446 0.000880 0.507469 0.62 R-squared 0.548663 Mean dependent var Adjusted R-squared 0.542964 S.D. dependent var S.E. of regression 0.6052 Akaike info criterion Sum squared resid 47.4253 Schwarz criterion Log likelihood -368.7840 F-statistic Since the p-values for ENROLL and URBAN are greater than 0.05, the variables are not significant. When you conduct an F-test to find whether the variables are jointly significant, you will find the p-value is greater than 0.05. This yields the following regression: Dependent Variable: MATH9 Method: Least Squares Date: 02/26/0 Time: 7:03 Sample: 407 Included observations: 405 Excluded observations: 2 Variable Coefficient Std. Error t-statistic Prob. C 0.07039 0.07595 0.98278 0.3266 DRUGS -0.73534 0.4390 -.77662 0.0764 MATH87 0.63976 0.036826 7.37269 0.0000 SES 0.4094 0.039050 3.6392 0.0003 R-squared 0.54252 Mean dependent var -0.026229 Adjusted R-squared 0.539099 S.D. dependent var 0.899902 S.E. of regression 0.6094 Akaike info criterion.86294 Sum squared resid 49.6727 Schwarz criterion.90738 Log likelihood -373.0943 F-statistic 58.544 Durbin-Watson stat.954262 Prob(F-statistic) 0.000000

Notice that when we drop the ENROLL and URBAN variables, the DRUGS variable becomes insignificant at the 95% level. A Wald test on the joint significance of the DRUGS, ENROLL, and URBAN coefficients from the baseline regression above yields: Wald Test: Equation: EQ0 Null Hypothesis: C(2)=0 C(3)=0 C(6)=0 F-statistic.808365 Probability 0.45052 Chi-square 5.425094 Probability 0.4388 This implies that the DRUGS, URBAN, and ENROLL variables are jointly insignificant at the 95% level. It is a matter of choice in deciding whether or not to drop these variables. Here, we decide to leave the variables in the regression. Given the sample size, we do not need to worry about degrees of freedom. Now, we will look more carefully at subsamples using dummy variables. We run the following regression for males and females: Dependent Variable: MATH9 Method: Least Squares Date: 03/0/0 Time: 6: Sample: 407 Included observations: 402 Excluded observations: 5 Variable Coefficient Std. Error t-statistic Prob. MALE*MATH87 0.572959 0.05084.26970 0.0000 (-MALE)*MATH87 0.7053 0.053825 3.0033 0.0000 MALE*DRUGS -.4846 0.475200-2.46700 0.06 (-MALE)*DRUGS -0.697427 0.50966 -.36497 0.73 MALE*ENROLL 0.4982 0.09595.562023 0.9 (-MALE)*ENROLL 0.07400 0.09838 0.76980 0.8596 MALE*SES 0.78070 0.054866 3.245536 0.003 (-MALE)*SES 0.0944 0.054720.79902 0.0862 MALE*URBAN 0.000568 0.00203 0.47255 0.637 (-MALE)*URBAN 0.000409 0.00268 0.322687 0.747 R-squared 0.555037 Mean dependent var -0.025206 Adjusted R-squared 0.54482 S.D. dependent var 0.902533 S.E. of regression 0.6089 Akaike info criterion.870273 Sum squared resid 45.3430 Schwarz criterion.969687 Log likelihood -365.9249 Durbin-Watson stat.983788 A couple of implications from this regression deserve comment. The F-statistics which test whether females and males respond differently all yield p-values that are greater than 0.5. This means that males and females respond the same way to DRUGS, ENROLL, SES, URBAN, and MATH87 in terms of their MATH9 performance. However, compared to the average response, the following observations hold. First, males at schools where drugs are prominent tend to perform worse than women at the same schools. It appears that the effect drugs have on women is not significant. Second, males at schools with high enrollment tend to perform better than women at the same schools. Third, social-economic status affects the performance of males more

than females in terms of test scores. Finally, males and females do not respond differently in terms of their previous math scores or the degree of urbanization. The latter finding could be the result of including both the ENROLL and URBAN variables in the regress. These two variables are most likely highly correlated, so it could be that ENROLL is taking explanatory power away from the URBAN variable. Even though males and females do not respond differently, the t-statisics reveal that males attending schools with high enrollment, low level of drugs, and with high social-economic status perform better on average. We have to consider this when making a recommendation on how to improve math scores. For instance, fighting a drug problem will be successful in raising test scores if the school male-dominated in terms of enrollment.