ECON 497 Midterm Spring

Similar documents
Regression of Inflation on Percent M3 Change

ECON 497: Lecture 4 Page 1 of 1

ECON 497 Final Exam Page 1 of 12

ECON 4230 Intermediate Econometric Theory Exam

Sociology 593 Exam 2 Answer Key March 28, 2002

Review of Multiple Regression

A particularly nasty aspect of this is that it is often difficult or impossible to tell if a model fails to satisfy these steps.

1 Correlation and Inference from Regression

CHAPTER 6: SPECIFICATION VARIABLES

SIMPLE REGRESSION ANALYSIS. Business Statistics

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Econometrics Homework 1

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Ordinary Least Squares Regression Explained: Vartanian

Midterm 2 - Solutions

Final Exam - Solutions

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

ECON 497: Lecture Notes 10 Page 1 of 1

1 A Non-technical Introduction to Regression

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

download instant at

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Final Exam - Solutions

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Sociology 593 Exam 2 March 28, 2002

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Simple Linear Regression

2 Prediction and Analysis of Variance

LECTURE 15: SIMPLE LINEAR REGRESSION I

Chapter 4: Regression Models

In the previous chapter, we learned how to use the method of least-squares

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:

Statistics and Quantitative Analysis U4320

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Lecture 4: Multivariate Regression, Part 2

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Binary Logistic Regression

WISE International Masters

( ), which of the coefficients would end

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

Regression Models REVISED TEACHING SUGGESTIONS ALTERNATIVE EXAMPLES

Lecture 4: Multivariate Regression, Part 2

Chapter 4. Regression Models. Learning Objectives

Homework Set 2, ECO 311, Fall 2014

WISE International Masters

Rockefeller College University at Albany

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

A discussion on multiple regression models

Statistics II Exercises Chapter 5

Applied Quantitative Methods II

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

Midterm 2 - Solutions

Stat 500 Midterm 2 12 November 2009 page 0 of 11

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

Final Exam. Name: Solution:

Chapter 3 Multiple Regression Complete Example

Inferences for Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

Basic Business Statistics 6 th Edition

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

Motivation for multiple regression

Sociology 593 Exam 1 Answer Key February 17, 1995

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Section 3: Simple Linear Regression

Multiple linear regression S6

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

The Simple Linear Regression Model

Can you tell the relationship between students SAT scores and their college grades?


Statistical View of Least Squares

ECON3150/4150 Spring 2015

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

General Linear Model (Chapter 4)

Eastern Mediterranean University Department of Economics ECON 503: ECONOMETRICS I. M. Balcilar. Midterm Exam Fall 2007, 11 December 2007.

Chapter 7 Student Lecture Notes 7-1

Chapter 16. Simple Linear Regression and Correlation

2. Linear regression with multiple regressors

Multiple Regression Analysis

Answer Key: Problem Set 6

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

EC4051 Project and Introductory Econometrics

y response variable x 1, x 2,, x k -- a set of explanatory variables

Practice exam questions

LI EAR REGRESSIO A D CORRELATIO

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Transcription:

ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain your answers. Fifty points total, point per part indicated in parentheses. 1. Omitted variables are a fun part of any regression model. Imagine that a regression is done estimating the time it takes a student at my son s elementary school to run 100 meters. The model estimated is: T i = 28 2M i 1.5G i Where T i is the time (in seconds) that it took student i to run 100 meters, M i is a male dummy variable and G i is the grade level (0 through 6) of the student. One excluded variable is the student s height. Assuming that height is relevant to a person s speed in the 100 meter dash, how would the exclusion of height bias the estimated coefficient on G i? Explain. (2) Height is probably positive correlated with grade and likely has a negative impact on time, so the bias term would be negative. The exclusion of height would negatively bias the estimated coefficient on G. 2. Explain briefly what endogeneity is and offer a simple example. (2) Endogeneity occurs when there is a causal link between an explanatory variable and either other explanatory variables or the dependent variable, so that the value of the explanatory variable in question is not truly independent of the other variables in the equation. One example is that the acceleration time of a car depends on its weight and horsepower, but horsepower might depend on the weight of the car, so horsepower is endogenous.

ECON 497 Midterm Spring 2009 2 3. Linear regression involves estimating a linear relationship between one or more independent or explanatory variables and a dependent variable. Imagine that such a relationship has been estimated between a person s income in thousands of dollars (I i ), their age (A i ), a dummy variable indicating whether they are male (M i ), a dummy variable indicating whether they have a college degree (C i ) and a male-age interactive term, equal to the product of the male dummy and their age (MA i ). The estimated equation is: I i = -2.0 + 0.5A i + 5.0M i + 12.0C i - 0.2MA i A. Calculate the predicted income for a 30 year old woman with no college degree. (1) I-hat = -2.0 + 0.5*30 = 13 or $13,000. B. Calculate the predicted income for a 30 year old woman with a college degree (1) I-hat = -2.0 + 0.5*30 + 12.0*1 = $25,000 C. What is the interpretation of the coefficient of 5.0 on the male dummy? (1) Other things being the same, on average a male would earn $5000 more than a female. D. On one set of axes, draw a basic graph of income versus age for a woman who has a college degree and for a man who has a college degree. I will grade this based on the relative positions of the vertical intercepts and the relative slopes of the two lines. (2)

ECON 497 Midterm Spring 2009 3 5. Consider the following output from a regression done in SPSS. Regression of acceleration time (S) on a manual transmission dummy (T), drag coefficient (E) and horsepower (H). Model 1 Regression Residual Total a. Predictors: (Constant), H, T, E b. Dependent Variable: S ANOVA b Sum of Squares df Mean Square F Sig. 178.917 3 59.639 26.889.000 a 75.411 34 2.218 254.328 37 Model 1 (Cons tant) T E H a. Dependent Variable: S Unstandardiz ed Coefficients Coe fficients a Standardized Coefficients B Std. Error Beta t Sig. 10.322 1.396 7.396.000 -.963.548 -.169-1.758.088 8.061 3.638.215 2.216.034 -.018.002 -.772-8.179.000 A. What is the null hypothesis of the F test? (1) That all of the slope coefficients are jointly zero. Put somewhat differently, the null hypothesis is that the model is worthless or, to use a technical term, crap. B. Briefly discuss the meaning/interpretation of the Sig. value for the F test. (1) The very small (<0.001) value of the Sig. value or p-value for the F-test suggests that the null hypothesis should be rejected, meaning that at least one slope coefficient is not zero or, alternatively, that the model is of some value. C. What would you tell someone who asked whether, according to this model, the type of transmission (T) a car has affects its acceleration time. Please be careful and complete in your answer. (3) While the estimated coefficient on T is not significant at the 5% level, it is significant at the 10% level and suggests that the type of transmission that a car has does impact its acceleration time even if, by some standards, this result is not statistically significant.

ECON 497 Midterm Spring 2009 4 6. Consider the following diagram: A. Clearly indicate in the diagram the linear regression residuals e 1, e 2 and e 3. (1) B. Fill in the blank: e 1 + e 2 + e 3 = 0. (1) 7. Explain briefly why the R 2 value from a regression based on two data points will be 1.000. (2) Because a line drawn using two data points will pass exactly through each of those points, leaving no residual, meaning that RSS=0 so that ESS=TSS and R 2 =1. 8. Write out the relationship between total sum of squares (TSS), explained sum of squares (ESS) and residual sum of squares (RSS). (1) TSS = ESS + RSS

ECON 497 Midterm Spring 2009 5 9. A regression can suffer from several different violations of the classical assumptions. Among these are: Heteroskedasticity Omitted Variables Serial Correlation Multicollinearity Endogeneity For each of the items presented below, tell me which of these violations it addresses and, based on what you see, is this likely a problem or not. Please explain briefly. Two points each. A. This is probably heteroskedasticity because the variation of the error term seems to depend on the value of X. It might also suggest that X 2 is an omitted variable. It could also be serial correlation if X is time. This does seem to be a problem. B. VIF = 3.836 The VIF is a test for multicollinearity, but its small value (<5) suggests that it is not a problem here.

ECON 497 Midterm Spring 2009 6 C. A negative and significant coefficient from a Park test. The Park test is used to detect heteroskedasticity and the significant, albeit negative, result here suggests that it is a problem. The variance of the error term is greater when the value of the explanatory factor is smaller. D. An unbelievably large estimated coefficient on an explanatory variable. This suggests some omitted variable for which the included variable s estimated coefficient is trying to compensate. E. A high R 2 from your regression, but no estimated coefficients that are significantly different from zero. This is one of the classic signs of multicollinearity, and to the extent that collinearity is ever a problem, it seems to be a problem here.

ECON 497 Midterm Spring 2009 7 10. At the end of this exam, you will find Excel regression output from a regression of house price on various explanatory variables. Use these regression results to answer the following questions. A. According to Studenmund s four criteria, should the variable Age be included in the model? Explain. (2) In theory, the age of a house should matter for its price. The estimated coefficient on age is significant. It is also positive, which may be the expected result or not, depending on how you view older houses. The adjusted R 2 doesn t really change when AGE is added, so this is a bit of a toss up. Excluding AGE seems to greatly bias BATHROOMS, suggesting that AGE should be included. Overall, it should be included. This is largely because of the theoretical reasons, but also because of the bias on BATHROOMS. B. With which other explanatory variable is Age most highly correlated? Explain how you know this. (2) It is most highly correlated with BATHROOMS and you can tell this because of the huge bias in the estimated coefficient on BATH when AGE is omitted. C. Is the correlation between Age and this other variable positive or negative? Explain how you know. (2) The correlation is negative because the impact of BATH on price should be positive, but the estimated coefficient on AGE is much smaller when BATH is excluded, so the correlation must be negative.

ECON 497 Midterm Spring 2009 8 11. Even after all these years of teaching this subject, I still get some sick pleasure out of watching people worry about multicollinearity. A. What three options are available for detection of multicollinearity? (2) Scatterplots showing relationships between explanatory variables. Correlation coefficients between explanatory variables. High R 2 and few or no significant estimated coefficients. High VIF numbers, generally greater than 5 or 10. B. In one word, what should you do to address this problem in your regression? (1) Nothing. 12. Consider the simplest possible regression model: Y i = β 0 + β 1 X i + ε i From which of the following violations of the underlying assumptions of OLS could this regression not possibly suffer? Explain why not. (2) Endogeneity Serial Correlation Might not be a problem as this doesn t seem to be a time series regression. Heteroskedasticity Multicollinearity This can t be a problem because there is only one explanatory variable. Omitted variable bias

ECON 497 Midterm Spring 2009 9 13. Here is some totally fake regression output. Calculate the correct values for the blanks. If you can t calculate a value, make your best guess and justify it. Model Regression Residual Total ANOVA Sum of Squares df Mean Square F Sig. 12200 2 579 97.685 BLANK B 800 98 928 BLANK A 100 coefficients Standardized Coefficient Model B Std. Error Beta t Sig. (Constant) X1 X2 10.50 3.00 BLANK E 3.50 0.02 2.00 0.385 0.477 BLANK C 150.00 6.00 0.002 BLANK D 0.000 A. (1) 12200 + 800 = 13000 B. (1) Given the high F stat this is probably 0.000. C. (1) 10.50/3.50 = 3.000. D. (1) Given the very large t stat this is probably 0.000. E. (1) E/2.00 = 6.00 -> E=12.00. F. Calculate the R 2 for this regression. (1) 12200/13000 =

ECON 497 Midterm Spring 2009 10 14. A colleague estimates the following regression model: Y i = β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i + ε i She gets a low R 2 and isn t too happy. She then estimates the following, slightly modified version of the model: ln Y i = β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i + ε i She gets a much higher R 2 and is very excited. She claims that the higher R 2 for the second model strongly supports the idea that this is the correct model to use for the data. What should you tell her? (2) You can t compare the two R-squared figures because the dependent variable has been transformed in a non-linear way.

ECON 497 Midterm Spring 2009 11 15. What is the most important problem with the following regression of house prices on various house characteristics, as seen on the second homework assignment? (2) Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1.506 a.257.190 3.7858849687 81858E4 a. Predictors: (Constant), AGE, SQFT, NEIGH, BATH Coefficients a Unstandardized Coefficients Standardized Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 35330.915 23840.136 1.482.145 SQFT 5.172 10.662.086.485.630 BATH 27204.392 11654.488.418 2.334.024 NEIGH 1992.057 6808.152.038.293.771 AGE -187.893 208.897 -.117 -.899.373 a. Dependent Variable: PRICE NEIGH is a qualitative variable and should be recoded as a series of dummy variables. Look for this question again on the final exam.

ECON 497 Midterm Spring 2009 12 SUMMARY OUTPUT Regression Statistics Multiple R 0.46 R Square 0.21 Adjusted R Square 0.20 Standard Error 285627.86 Observations 653.00 Coefficients Standard Error t Stat P-value Intercept -102126.32 67119.17-1.52 0.13 SQFTTOTL 169.96 22.35 7.60 0.00 SQFTLOT 0.16 0.17 0.92 0.36 STORIES 36901.19 27087.47 1.36 0.17 BATHS 46263.97 30138.08 1.54 0.13 BEDS -34681.51 16811.06-2.06 0.04 AGE 1573.48 591.21 2.66 0.01 SUMMARY OUTPUT Regression Statistics Multiple R 0.45 R Square 0.20 Adjusted R Square 0.20 Standard Error 286967.50 Observations 653.00 Coefficients Standard Error t Stat P-value Intercept 17440.21 50100.59 0.35 0.73 SQFTTOTL 177.13 22.30 7.94 0.00 SQFTLOT 0.16 0.17 0.91 0.36 STORIES 27994.00 27006.00 1.04 0.30 BATHS 473.62 24860.54 0.02 0.98 BEDS -28696.38 16738.11-1.71 0.09