Question 1 [17 points]: (ch 11)

Similar documents
Lecture#17. Time series III

Handout 12. Endogeneity & Simultaneous Equation Models

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Lab 07 Introduction to Econometrics

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

Introduction to Econometrics

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Econometrics. 8) Instrumental variables

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Introduction to Econometrics

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Practice exam questions

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Nonlinear Regression Functions

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

9) Time series econometrics

Econometrics. 9) Heteroscedasticity and autocorrelation

Econometrics Midterm Examination Answers

10) Time series econometrics

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Instrumental Variables, Simultaneous and Systems of Equations

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Empirical Application of Simple Regression (Chapter 2)

Lecture 14. More on using dummy variables (deal with seasonality)

Problem Set 10: Panel Data

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

Introduction to Econometrics

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

University of Maryland Spring Economics 422 Final Examination

Vector Autogregression and Impulse Response Functions

Econometrics Homework 1

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Autoregressive models with distributed lags (ADL)

Essential of Simple regression

Introduction to Econometrics. Regression with Panel Data

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Handout 11: Measurement Error

Exercices for Applied Econometrics A

. regress lchnimp lchempi lgas lrtwex befile6 affile6 afdec6 t

Fixed and Random Effects Models: Vartanian, SW 683

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Unemployment Rate Example

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

-redprob- A Stata program for the Heckman estimator of the random effects dynamic probit model

ECON Introductory Econometrics. Lecture 13: Internal and external validity

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

Econ 423 Lecture Notes

7 Introduction to Time Series Time Series vs. Cross-Sectional Data Detrending Time Series... 15

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Ecmt 675: Econometrics I

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Cointegration and Error-Correction

Graduate Econometrics Lecture 4: Heteroskedasticity

Applied Statistics and Econometrics

Quantitative Methods Final Exam (2017/1)

7 Introduction to Time Series

Empirical Application of Panel Data Regression

Econometrics -- Final Exam (Sample)

Dynamic Panel Data Models

ECON3150/4150 Spring 2015

ECON Introductory Econometrics. Lecture 17: Experiments

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

ECO220Y Simple Regression: Testing the Slope

ECON3150/4150 Spring 2016

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

F Tests and F statistics

Lecture 8: Instrumental Variables Estimation

Answers: Problem Set 9. Dynamic Models

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Linear Regression with Multiple Regressors

Econometrics Homework 4 Solutions

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Problem Set 5 ANSWERS

Practice 2SLS with Artificial Data Part 1

Lecture 4: Multivariate Regression, Part 2

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

Estimating Markov-switching regression models in Stata

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Specification Error: Omitted and Extraneous Variables

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

WISE International Masters

Interpreting coefficients for transformed variables

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)

Binary Dependent Variables

At this point, if you ve done everything correctly, you should have data that looks something like:

Lecture 8: Functional Form

1 The basics of panel data

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

An explanation of Two Stage Least Squares

Problem Set 1 ANSWERS

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Transcription:

Question 1 [17 points]: (ch 11) A study analyzed the probability that Major League Baseball (MLB) players "survive" for another season, or, in other words, play one more season. They studied a model of the following form: The dependent variable is a binary variable that takes on a value of one if the player played one more season (a minimum of 50 at bats or 25 innings pitched), and zero otherwise. Seasons is the number of total seasons played, measured in years, Perf is the performance of the player this year, and Avgperf is the average performance of the player over their career. The researchers had a sample of 4,728 hitters and 3,803 pitchers for the years 1901-1999. All explanatory variables are standardized (sample mean of 0, variance of 1). Probit estimation yielded the results as shown in the table: Regression (1) Hitters (2) Pitchers Regression model probit probit constant 2.010 (0.030) 1.625 (0.031) number of seasons played -0.058 (0.004) -0.031 (0.005) performance 0.794 (0.025) 0.677 (0.026) average performance 0.022 (0.033) 0.100 (0.036) (a) (6p) Interpret the two probit equations and calculate survival probabilities for hitters and pitchers at the sample mean. Provide an explanation for why these are so high. (b) (6p) Calculate the change in the survival probability for a player who has a very bad year by performing two standard deviations below the average (assume also that this player has been in the majors for many years so that his average performance is negligibly affected). How does this change the survival probability when compared to the answer in (a)? (c) (5p) Since the results for hitters and pitchers seem similar, the researcher could consider combining the two samples. With a combined sample, how could you test the hypothesis that the coefficients for the explanatory variables are the same for hitters and pitchers? Explain in some detail.. Answer: (a) Note that all variables are standardized, so that the mean is zero. This results in a survival probability of 0.997 for hitters and 0.991 for pitchers. These results are so 2

high because there is a high probability, in general, for a player to return the following season. (b) Since the variables are standardized, this implies a change of two for the performance variable. The result for hitters is a lowering of the survival probability to 0.65, and for pitchers to 0.633 (c) After combining the sample for hitters and pitchers, you would allow for a different intercept and slopes by introducing a binary variable for pitchers if hitters are the default. This binary variable would be introduced by itself and in combination with each of the above variables, thereby allowing all coefficients to differ. You could then conduct an F-test for the joint hypothesis that all coefficients involving the binary variables are zero. If the hypothesis cannot be rejected, then there is no difference between the coefficients for hitters and pitchers. 3

Question 2 [21 points]: (ch 10) Consider the following panel data regression with a single explanatory variable Yit = β0 + β1xit +. In each of the examples below, you will be including entity and time fixed effects. (a) (3 p) Consider the effect of beer taxes on the fatality rate using annual data from 1982-1988, and nine U.S. regions (New England, Pacific, Mid-Atlantic, South, etc.). How many total coefficients do you need to estimate? (b) (4 p) Certain regions (e.g. New England) that tend to have higher beer taxes also tend to have consistently higher quality hospitals. Does this pose a threat to your analysis? (c) (3 p) Consider the effect of the minimum wage on teenage employment using annual data from 1963-2000 for five Canadian Regions (Atlantic Provinces, Quebec, Ontario, Prairies, British Columbia). How many total coefficients do you need to estimate? (d) (4 p) Nationwide recessions impact both teenage employment and the minimum wage across the country. Does this pose a threat to your analysis? (e) (3 p) Consider the effect of savings rates on per capita income using data for three decades (1960-1969, 1970-1979, 1980-1989; one observation per decade) and 104 countries. How many total coefficients do you need to estimate? (f) (4 p) A number of countries industrialized at different times between 1960-1989, a process which can impact both the savings rate and per capita income. Does this pose a threat to your analysis? Answer: (a) 16 coefficients (6 time fixed effects, 8 entity fixed effects, intercept, slope). (b) No, entity fixed effects will account for entity constant omitted variables. (c) 43 coefficients (37 time fixed effects, 5 entity fixed effects, intercept, slope). (d) No, time fixed effects will account for this. (e) 107 coefficients (3 time fixed effects, 103 entity fixed effects, intercept, slope). (f) Yes, industrialization is a time and entity varying omitted variable. 4

Question 3 [15 points]: (IV regression) (Ch 12) Consider a supply model for edible chicken, which the the U.S. Department of Agriculture calls broilers Data for this question is adapted from the data provided by Epple and McCallum (2006) 1. The data are annual, 1950-2001 The Supply equation is: ( ) ( ) ( ) ( ) where is aggregate production of young chickens, is the real price index of fresh chicken, is real price index of broiler feed, and which is included to capture any technical progress in the production. Some potential external instrumental variables are ( ), where is the real per capita income; ( ), where is the real price of beef; is the percent population growth from year t-1 to year t; ( ) is the lagged log of real price of chickens; ( ) is the log of exports of chicken. Estimated supply equation for chicken can be written from the following output: Regression 1:. reg lnqprod lnp lnpf TIME lnqprod_1 Source SS df MS Number of obs = 40 -------------+------------------------------ F( 4, 35) = 3102.49 Model 11.9815945 4 2.99539863 Prob > F = 0.0000 Residual.03379186 35.000965482 R-squared = 0.9972 -------------+------------------------------ Adj R-squared = 0.9969 Total 12.0153864 39.308086831 Root MSE =.03107 lnqprod Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- lnp.0091099.0679409 0.13 0.894 -.1288175.1470373 lnpf -.0901945.0426459-2.11 0.042 -.1767703 -.0036186 TIME.0111706.0051486 2.17 0.037.0007183.0216229 lnqprod_1.7326902.1066347 6.87 0.000.5162103.94917 _cons 2.109681.7991519 2.64 0.012.487316 3.732045 Regression 2:. ivreg lnqprod (lnp=lnpb lny POPGRO lnexpts) lnpf TIME lnqprod_1 Instrumental variables (2SLS) regression Source SS df MS Number of obs = 40 -------------+------------------------------ F( 4, 35) = 1619.82 Model 11.9506133 4 2.98765333 Prob > F = 0.0000 Residual.064773079 35.001850659 R-squared = 0.9946 -------------+------------------------------ Adj R-squared = 0.9940 Total 12.0153864 39.308086831 Root MSE =.04302 1 Simultaneous Equation Econometrics: The Missing Example, Economic Inquiry, 44(2), 374-384 5

lnqprod Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- lnp.393975.1749342 2.25 0.031.0388398.7491103 lnpf -.1909911.0705566-2.71 0.010 -.3342286 -.0477535 TIME.0242389.0087117 2.78 0.009.0065532.0419247 lnqprod_1.5489031.1635754 3.36 0.002.2168274.8809789 _cons 3.298617 1.196567 2.76 0.009.8694559 5.727778 Instrumented: lnp Instruments: lnpf TIME lnqprod_1 lnpb lny POPGRO lnexpts Regression 3:. reg lnp lnpb lny POPGRO lnexpts lnpf TIME lnqprod_1 Source SS df MS Number of obs = 40 -------------+------------------------------ F( 7, 32) = 49.65 Model 1.61496433 7.230709191 Prob > F = 0.0000 Residual.14868612 32.004646441 R-squared = 0.9157 -------------+------------------------------ Adj R-squared = 0.8973 Total 1.76365045 39.045221807 Root MSE =.06816 lnp Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- lnpb.1159974.2186138 0.53 0.599 -.3293044.5612991 lny 1.471961.6529929 2.25 0.031.1418577 2.802064 POPGRO.0697965.0908676 0.77 0.448 -.1152949.2548878 lnexpts 2.438689.6971098 3.50 0.001 1.018723 3.858655 lnpf.154805.1068706 1.45 0.157 -.0628833.3724932 TIME -.0735312.0230427-3.19 0.003 -.1204676 -.0265948 lnqprod_1 -.0086269.2911554-0.03 0.977 -.601691.5844372 _cons -11.95739 6.311461-1.89 0.067-24.81341.8986362 -----------------------------------------------------------------------------c Regression 4:. reg lnqprod lnp lnpf TIME lnqprod_1 Source SS df MS Number of obs = 40 -------------+------------------------------ F( 4, 35) = 3102.49 Model 11.9815945 4 2.99539863 Prob > F = 0.0000 Residual.03379186 35.000965482 R-squared = 0.9972 -------------+------------------------------ Adj R-squared = 0.9969 Total 12.0153864 39.308086831 Root MSE =.03107 lnqprod Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- lnp.0091099.0679409 0.13 0.894 -.1288175.1470373 lnpf -.0901945.0426459-2.11 0.042 -.1767703 -.0036186 TIME.0111706.0051486 2.17 0.037.0007183.0216229 lnqprod_1.7326902.1066347 6.87 0.000.5162103.94917 _cons 2.109681.7991519 2.64 0.012.487316 3.732045 6

. predict e, residuals (1 missing values generated) Regression 5:. reg e lnpb lny POPGRO lnexpts lnpf TIME lnqprod_1 Source SS df MS Number of obs = 40 -------------+------------------------------ F( 7, 32) = 2.19 Model.010946966 7.001563852 Prob > F = 0.0618 Residual.022844894 32.000713903 R-squared = 0.3240 -------------+------------------------------ Adj R-squared = 0.1761 Total.03379186 39.000866458 Root MSE =.02672 e Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- lnpb.1180813.0856913 1.38 0.178 -.0564662.2926289 lny.2378684.2559575 0.93 0.360 -.2835.7592367 POPGRO -.0123288.0356179-0.35 0.732 -.0848802.0602225 lnexpts.9702997.2732502 3.55 0.001.4137072 1.526892 lnpf -.0522353.0418907-1.25 0.221 -.1375639.0330932 TIME -.0045154.0090322-0.50 0.621 -.0229133.0138826 lnqprod_1 -.2648651.1141259-2.32 0.027 -.497332 -.0323983 _cons.1666471 2.473941 0.07 0.947-4.872605 5.205899. test lnpb lny POPGRO lnexpts ( 1) lnpb = 0 ( 2) lny = 0 ( 3) POPGRO = 0 ( 4) lnexpts = 0 F( 4, 32) = 3.83 Prob > F = 0.0118 (a) (4p) Compare the results in regression 1 and 2. Explain the reasons for instrumental variables in regression 2? Answer: (b) (5p) What are the requirements for valid instruments? Explain with mathematical conditions. Answer: 7

Relevance: ( ) Exogeneity: ( ) (c) (6p) Do these instruments satisfy the requirements? You must use the necessary regression results for your answer. Please specify the regression number you use while answering each part of this questions. (1) Relevancy: Using regression 3, square of t test is greater than 10 only for lnexprt, that is the only relevant IV. (2) Exogeneity: ( ) Hence, reject Therefore IV are not exogeneous. 8

Question 4 [15 points]: (Ch 15) There is some economic research that suggests that oil prices play a central role in causing recessions in developed countries. In particular, this research suggests that it is specifically increases in oil prices that matter. As a result, economists often look only at the percentage point difference between oil prices at date t and the maximum value over the previous year. However, you notice that energy prices can fluctuate quite dramatically in both directions and believe that geographic areas also benefit substantially from oil price decreases. As a result, you decide to consider the effect of real oil prices (Poil/CPI) on GDP growth (Yt) You estimate the following distributed lag model using annual data (numbers in parenthesis are HAC standard errors): t = 3.39-0.009 (Poil/CPI)t - 0.028 (Poil/CPI)t-1 (0.27) (0.010) (0.011) t = 1960-2008, R2 = 0.15, SER = 1.88 (a) (5p) What is the impact effect of a 25 percentage point increase in real oil prices? (b) (5p) What is the predicted cumulative change in GDP Growth over two years of this effect? (c) (5p) The HAC F-statistic is 4.07. Can you reject the null hypothesis that oil price changes have no effect on real GDP growth? What is the critical value you considered? Is there any reason why you should be cautious using an F-test in this case, given the sample period? Answer: a. GDP growth would decrease by almost a quarter of a percentage point. b. The predicted decline in growth would be almost one percentage point (-0.925). c. The critical value of F2, = 3.00 at the 5% significance level. Hence you can reject the null hypothesis that oil prices have no effect on real GDP growth. However, since the sample period involves only 50 or so observations, it is not clear that the test statistic is actually F-distributes (small sample). 9

Question 5 [20 points]: (Ch 14) Given the following STATA output, you can find a VAR(2) (VectorAutoregression) model of change in inflation ( ) and unemployment rate ( ). var unem cinf Vector autoregression Sample: 1951-2012 No. of obs = 62 Log likelihood = -201.564 AIC = 6.824644 FPE = 3.156906 HQIC = 6.959349 Det(Sigma_ml) = 2.284871 SBIC = 7.167731 Equation Parms RMSE R-sq chi2 P>chi2 ---------------------------------------------------------------- unem 5 1.00228 0.6589 119.7914 0.0000 cinf 5 1.72495 0.3091 27.73971 0.0000 ---------------------------------------------------------------- Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- unem unem L1. 1.061241.1303681 8.14 0.000.8057245 1.316758 L2. -.2874012.133048-2.16 0.031 -.5481705 -.026632 cinf L1..0976014.0668152 1.46 0.144 -.0333539.2285567 L2..0623594.0572543 1.09 0.276 -.049857.1745758 _cons 1.345204.513183 2.62 0.009.3393835 2.351024 -------------+---------------------------------------------------------------- cinf unem L1. -.4678597.2243671-2.09 0.037 -.907611 -.0281084 L2..2932862.2289793 1.28 0.200 -.155505.7420773 cinf L1. -.0527481.1149907-0.46 0.646 -.2781258.1726296 L2. -.430232.0985363-4.37 0.000 -.6233595 -.2371044 _cons 1.00306.883202 1.14 0.256 -.7279845 2.734104 Table 1 Year Unem Inflation 2008 5.8 3.8 2009 9.3-0.3 2010 9.6 1.6 2011 8.9 3.1 2012 8.1 2.1 (a) (4p) Given the actual realizations of unemployment and inflation in table 1, forecast unemployment for 2013, show your work 10

(b) (4p) Given the actual realizations of unemployment and inflation in table 1, forecast inflation for 2013, show your work (c) (4p) Following is the joint test result for the second lags of unemployment rate and the inflation rate, according to the following test, would a VAR(1) model be better forecasting model than a VAR(2) model, explain why?. test L2.cinf L2.unem ( 1) [unem]l2.cinf = 0 ( 2) [cinf]l2.cinf = 0 ( 3) [unem]l2.unem = 0 ( 4) [cinf]l2.unem = 0 chi2( 4) = 30.26 Prob > chi2 = 0.0000 (d) (4p) Why might a researcher use change in inflation as opposed to inflation in this model? Explain. (e) (4p) Should one use change in unemployment instead of unemployment? Explain. 11

Question 6 [12 points]: (Derivation question) Consider the panel data model: where are i.i.d. and independent of Xs with mean zero and variance, (a) (3 p) Define and, the entity demeaned values of X and Y. (b) (3 p) Rewrite the model in terms of these demeaned variables. (c) (3 p) Derive algebraically the fixed-effects estimator of. The fixed effects estimator minimizes the sum of squared residuals of the model you wrote in part b. (d) (3 p) Show that, if is a random variable that is independent of X and u, the estimator is unbiased for. Explain your answer. Answer: (a) (b) (c) 12

Subtracting the last equation from the first we would get; ( ) or we can also write it as; (d) The fixed-effects estimator of is the OLS estimator of the above regression. ( ) ( ) Hence, Using We can write ( ) Since is independent of X s and U s, using Law of Iterated Expectations we can show that [ ] 13

Bonus Question [2 points]: The two conditions for instrument validity are corr(zi, Xi) 0 and corr(zi, ui) = 0. The reason for the inconsistency of OLS is that corr(xi, ui) 0. If X and Z are correlated, and X and u are also correlated, how is it possible that Z and u are not correlated? Explain. Answer: The major idea is that corr(xi, ui) has two parts: one for which the correlation is zero and a second for which it is non-zero. The trick is to isolate the uncorrelated part of X. For the instrument to be valid, corr(zi, ui) = 0 and corr(zi, Xi) 0 must hold. TSLS then generates predicted values of X in the first stage by using a linear combination of the instruments. As long as corr(zi, Xi) 0 and corr(zi, ui) = 0, then the part of X which is uncorrelated with the error term is extracted through the prediction. In the second stage, this captured exogenous variation in X is then used to estimate the effect of X on Y, which is exogenous. 14

Selected Tables from Stock and Watson, Introduction to Econometrics 15

16

17

18