Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Similar documents
Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Lab 6 - Simple Regression

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics. 9) Heteroscedasticity and autocorrelation

ECO220Y Simple Regression: Testing the Slope

Applied Statistics and Econometrics

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

STATISTICS 110/201 PRACTICE FINAL EXAM

Lab 07 Introduction to Econometrics

Econometrics Homework 1

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

ECON Introductory Econometrics. Lecture 17: Experiments

Handout 12. Endogeneity & Simultaneous Equation Models

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Instrumental Variables, Simultaneous and Systems of Equations

1 Independent Practice: Hypothesis tests for one parameter:

Regression #8: Loose Ends

Autoregressive models with distributed lags (ADL)

Practice exam questions

Making sense of Econometrics: Basics

Graduate Econometrics Lecture 4: Heteroskedasticity

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Lecture 4: Multivariate Regression, Part 2

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

The Simple Regression Model. Part II. The Simple Regression Model

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Applied Statistics and Econometrics

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Lab 10 - Binary Variables

Handout 11: Measurement Error

sociology 362 regression

Final Exam. 1. Definitions: Briefly Define each of the following terms as they relate to the material covered in class.

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:

Statistical Inference with Regression Analysis

7 Introduction to Time Series

Table 1: Fish Biomass data set on 26 streams

ECON3150/4150 Spring 2016

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

7 Introduction to Time Series Time Series vs. Cross-Sectional Data Detrending Time Series... 15

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Essential of Simple regression

10) Time series econometrics

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

Econometrics Midterm Examination Answers

Question 1 [17 points]: (ch 11)

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

Multiple Regression: Inference

UNIVERSITY OF WARWICK. Summer Examinations 2015/16. Econometrics 1

General Linear Model (Chapter 4)

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

sociology 362 regression

THE MULTIVARIATE LINEAR REGRESSION MODEL

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Lecture 4: Multivariate Regression, Part 2

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007

The Regression Tool. Yona Rubinstein. July Yona Rubinstein (LSE) The Regression Tool 07/16 1 / 35

Applied Statistics and Econometrics

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Quantitative Methods Final Exam (2017/1)

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Correlation and Simple Linear Regression

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Problem Set 1 ANSWERS

ECON Introductory Econometrics. Lecture 13: Internal and external validity

ECON3150/4150 Spring 2015

Introduction to Econometrics

Binary Dependent Variables

An explanation of Two Stage Least Squares

Ch 2: Simple Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Section Least Squares Regression

Nonlinear Regression Functions

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

9) Time series econometrics

Computer Exercise 3 Answers Hypothesis Testing

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Dynamic Panel Data Models

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

The OLS Estimation of a basic gravity model. Dr. Selim Raihan Executive Director, SANEM Professor, Department of Economics, University of Dhaka

Lecture 12: Interactions and Splines

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis

Interpreting coefficients for transformed variables

Mathematics for Economics MA course

Econometrics. 8) Instrumental variables

Transcription:

B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries 4 marks. [1] If H denotes a hypothesis and A denotes an event or data, explain the difference and the connection between the probabilities P (A H) and P (H A). Would the finding that P (A H) < 0.05 be good reason to reject H? Illustrate your answer with an example. [2] Write down the mathematical form of a probability density function which could be used to model the distribution of wealth in a country. Discuss its features in terms of mode, mean and skewness. How would the shape of its graph differ between developed and developing countries? [3] A friend argues that in the presence of heteroscedasticity, the only way to obtain unbiased parameter estimates is to use Weighted Least Squares. Do you agree? Give details. [4] In the model y i = α 0 + α 1 x i + α 2 z i + u i, where E(u i z i, x i ) = 0, you are interested in estimating the parameter α 1. Unfortunately, you do not observe z i. Explain under what assumptions a regression of y i on x i only gives you an unbiased estimate for α 1. [5] In the regression y i = β 0 + β 1 x i + e i

you have reason to believe that x i is correlated with the error term e i. You observe a variable z i which you would like to use as an instrument for x i. What are the two conditions under which Instrumental Variable estimation, using z i as an instrument, gives you a consistent parameter estimate for β 1? Can you test these assumptions (and how)? [6] You regress log wages on education (ed), work experience in full time jobs (full), and work experience in part time jobs (part). The following are your results: Source SS df MS Number of obs = 5865 ---------+------------------------------ F( 3, 5861) = 1082.07 Model 371.074906 3 123.691635 Prob > F = 0.0000 Residual 669.973124 5861.114310378 R-squared = 0.3564 ---------+------------------------------ Adj R-squared = 0.3561 Total 1041.04803 5864.177532065 Root MSE =.3381 lnwage Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- ed.1053107.0019611 53.700 0.000.1014662.1091552 full.0132439.0005003 26.471 0.000.012263.0142247 part.0068109.0007076 9.625 0.000.0054237.008198 _cons 1.311452.0259925 50.455 0.000 1.260498 1.362407 You would like to test the hypothesis that part time experience and full time experience have the same effect on log wages. Test this hypothesis, using a t-test (the covariance between the parameter estimates on the variables full and part is 0.00000013). [7] Briefly discuss three problems associated with estimating a model with a discrete 0-1 dependent variable by ordinary least squares. [8] Give two advantages of using panel data as compared to cross-sectional data. 2

Part II a Answer 2 questions from this section. Each question carries 23 marks. [1] Consider the following model: y i = α 0 + α 1 x i + u i, where E(u i x i ) = 0, Cov(u i, u j ) = 0 for i j, but V ar(u i ) = σ 2 i you wish to estimate the parameters α 0 and α 1. σ 2. Suppose (a) Is the Least Squares estimator for α 1 (i) unbiased (ii) efficient? Can you use the standard errors you obtain from Least Squares for hypothesis testing? (b) Suppose you expect the variance of the error term u i, σi 2, to be related to the variable x i as follows: exp(σ 2 i ) = γ 0 x γ 1 i e v i, where v i is an error term, uncorrelated with x i. Explain in detail how you would estimate your model, using Weighted Least Squares Estimation. (c) Suppose now that the way you have specified the relationship between σi 2 and x i is wrong. Does the Weighted Least Squares estimator still give you unbiased estimates for α 1? Give details. (d) A friend of yours argues that, if you use weighted least squares, the coefficient of determination R 2 is still an appropriate measure to assess the fit of the model. Do you agree? Explain. Can you convince your friend of the opposite, using a simple example? 3

[2] Consider the following log wage equation, which you estimate separately for public and private sector workers: lnw i = α j 0 + α j 1ed i + α j 2ex i + u i, where i is an index for individuals, and j is an index for the sector (j = P ublic or j = P rivate). Furthermore, lnw i is the log of hourly wages, ed i is number of years of education, and ex i is labour market experience. Finally, u i is an error term, assumed to be uncorrelated with the regressors, not autocorrelated, and homoscedastic. Suppose that you obtain the following regression results for all workers, and public and private sector workers: ALL WORKERS:. regress lnwage ed ex Source SS df MS Number of obs = 5865 ---------+------------------------------ F( 2, 5862) = 1558.98 Model 361.465205 2 180.732603 Prob > F = 0.0000 Residual 679.582825 5862.115930199 R-squared = 0.3472 ---------+------------------------------ Adj R-squared = 0.3470 Total 1041.04803 5864.177532065 Root MSE =.34049 lnwage Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- ed.1071335.0019648 54.527 0.000.1032818.1109852 ex.0116634.000473 24.658 0.000.0107361.0125907 _cons 1.288384.0260531 49.452 0.000 1.237311 1.339458. regress lnwage ed ex if pu==1 PUBLIC SECTOR WORKERS: Source SS df MS Number of obs = 2155 ---------+------------------------------ F( 2, 2152) = 988.62 Model 163.335287 2 81.6676433 Prob > F = 0.0000 Residual 177.771871 2152.082607747 R-squared = 0.4788 ---------+------------------------------ Adj R-squared = 0.4784 Total 341.107158 2154.158359869 Root MSE =.28742 lnwage Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- ed.0963147.0021715 44.354 0.000.0920562.1005731 ex.0097724.0006537 14.949 0.000.0084905.0110544 _cons 1.497093.0326531 45.848 0.000 1.433058 1.561128 4

PRIVATE SECTOR WORKERS:. regress lnwage ed ex if pu==0 Source SS df MS Number of obs = 3710 ---------+------------------------------ F( 2, 3707) = 509.14 Model 135.545467 2 67.7727335 Prob > F = 0.0000 Residual 493.443743 3707.133111342 R-squared = 0.2155 ---------+------------------------------ Adj R-squared = 0.2151 Total 628.98921 3709.169584581 Root MSE =.36484 lnwage Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- ed.112884.0037483 30.116 0.000 ex.0122555.0006542 18.735 0.000 _cons 1.191816.045656 26.104 0.000 (a) Interpret the coefficient estimates. Comment on the R 2, and the difference between public and private sector regressions. (b) For private sector workers, your STATA output does not report the 95% confidence intervals. Compute the 95% confidence intervals for the parameter estimates for education and labour market experience. Compare confidence intervals for education for public and private sector workers. If you assume that the coefficient estimates are independent, would you reject the hypothesis that returns to education are the same for the two groups (at the 5 percent level of significance)? (c) Test the the null hypothesis that the parameters α 0, α 1 and α 2 differ between public and private sector workers. (d) The means of log wages, education, and labour market experience are given by -> pu= 0 Variable Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- ed 3710 10.8527 1.66812 7 18 ex 3710 17.36576 9.558307 0 44 lnwage 3710 2.629738.4118065 -.3189805 5.093538 -> pu= 1 Variable Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- ed 2155 12.24223 2.960554 8.5 18 ex 2155 18.59111 9.834444 0 45 lnwage 2155 2.857879.3979446.1839543 4.597004 5

Compute the mean log wage differential, and decompose it into a part which is due to differences in parameter estimates, and a part which is due to differences in the means of the explanatory variables. 6

[3] Suppose you work in a large consultancy company on a government funded project which is aimed to assess whether there should be more government spending for sport facilities. The argument supporters of increased spending have brought forward is that doing sports enhances productivity. Their claims are based on a research project, where the wages of individuals (which in a competitive market should correspond to their productivity) are regressed on a number of variables like education and work experience, and on a binary indicator, being equal to one if the individual reports to do sports at least once a week, and zero otherwise. The following table reports results from their regressions (where ed is education in years, and ex and ex2 are work experience and its square; sport is a dummy variable, being one if the individual does sports at least once a week):. regress lnwage ed ex ex2 sport Source SS df MS Number of obs = 8757 ---------+------------------------------ F( 4, 8745) = 757.82 Model 606.65403 4 55.1503663 Prob > F = 0.0000 Residual 636.414947 8745.072774722 R-squared=0.4880 ---------+------------------------------ Adj R-squared = 0.4874 Total 1243.06898 8756.141967677 Root MSE =.26977 lnwage Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- ed.0701043.0013942 50.283 0.000.0673714.0728373 ex.037197.0012478 29.811 0.000.0347511.039643 ex2 -.0006162.0000253-24.321 0.000 -.0006659 -.0005666 sports.0420529.0063219 6.652 0.000.0296606.0544453 _cons 1.451301.0210253 69.027 0.000 1.410087 1.492516 (a) Interpret the coefficient on the sports variable. (b) Do you think it is correct to conclude from these regressions that sports enhances productivity (and wages)? Give reasons why the variable Sports could be endogenous in the wage regression, and explain. (c) Based on your reason(s) given for a possible endogeneity of the variable sports, are the estimates in the table likely to be over estimates or under estimates of the true parameter? 7

(d) A good friend recommends to use instrumental variable estimation to estimate the model, and she proposes the degree of physical disability as an instrument, which you observe in your data. Do you agree? Do you think that the proposed variable is a valid instrument? Explain in detail. 8

[4] You have a sample of workers who are either self employed or in salaried employment, and you run earnings regressions. The first 2 columns of table 1 report estimated coefficients and standard errors from a regression of log monthly earnings on years of education, years of labour market experience, and a dummy variable, which is equal to one if the individual is self employed. Columns 3 and 4 report results, adding interaction terms between self employment, and education and experience. Columns 5 and 6 add a dummy to the specification in columns 1 and 2, which is one if the the individual works in the public sector. Table 1: Earnings Regressions; Dep. Variable: Log Monthly Earnings Specification 1 Specification 2 Specification 3 Variable Coeff StdError Coeff StdError Coeff StdError Years Education 0.0834 0.0040 0.0824 0.0040 0.0884 0.0040 Years Work Experience 0.0054 0.0007 0.0059 0.0007 0.0057 0.0007 Self Employed 0.1616 0.0399 0.0494 0.2732 0.1302 0.0400 Years Work Experience Self Employed -0.0195 0.0042 Years Education Self Employed 0.0501 0.0200 Public Sector worker -0.0915 0.01771 Constant 6.9144 0.05817 6.9144 0.0591 6.8730 0.05817 R 2 regression in columns 1: 0.25. R 2 regression in columns 2: 0.27. R 2 regression in columns 3: 0.26. Number of observations: 1336. (a) Interpret the estimation results for the specification in columns 1,2. Test the null hypothesis that the difference in average log earnings between self employed and salaried workers is 0.2. (b) Using the results for specification 2 in the table, test the hypotheses that log earnings of the self employed grow faster with education than log earnings of salaried workers. Draw the log earnings-experience profiles for a salaried and a self employed worker for both specifications 1 and 2. Why does the coefficient on the dummy variable for the self employed differ between the two specifications? 9

(c) Specification 3 is the same as specification 1, but an additional dummy variable is added which is equal to one if the individual works in the public sector, and zero otherwise. Only salaried workers can work in the public sector. Interpret the coefficients on the two dummy variables. How would you test the hypothesis that log earnings of salaried workers in the private sector grow slower with labour market experience than log earnings of workers in the public sector? (d) In log earnings regressions, the coefficient on dummy variables are sometimes interpreted as percentage differences in the dependent variable for the groups for which the dummy variable is zero and one respectively. Explain why this interpretation may be justified. For specification 1, compute the exact percentage difference in log earnings between self employed and salaried workers. 10

Part II b Answer 1 question from this section. Each question carries 22 marks. [5] (a) What are the consequences of estimating a model with autocorrelated errors by ordinary least squares. (b) Give two examples of situations where you think an economic application may have serious autocorrelation problems. (c) Describe the Durbin-Watson test for autocorrelation. Explain how you would would transform a regression model to deal with an AR(1) error process? In what circumstances would such a procedure be inappropriate? Describe what test you would use in these circumstances. (d) You estimate the following autoregression for y based on monthly data between 1981 and 1990 (all months in all years, standard errors in brackets). ŷ t = 0.123+ 0.765 y t 1 +0.124 y t 2 (0.004) (0.089) (0.067) Sample size = 120, Residual sum of squares = 17.28. You then add in a set of month dummy variables and obtain the following ŷ t = 0.098+ 0.544 y t 1 +0.067 y t 2 + month (0.008) (0.122) (0.070) dummies Sample size = 120, Residual sum of squares = 15.46. Carry out a test of the null hypothesis that the month dummies are jointly insignificant (note: there is a constant in the original equation). 11

[6] Suppose you have (balanced) panel data on N individuals observed in T time periods and you specify the following panel data model for individual i in year t Y it = a i + βx it + u it, i = 1, 2...N, t = 1, 2,...T, where Y is the dependent variable of interest, X is a set of exogenous independent variables and u a random error. (a) How would you interpret the a i parameters in this model? Use some relevant economic examples in your discussion. (b) If N is large explain how you would estimate the above model. (c) Explain how you would test the hypothesis that a i = a (a constant). Suppose you could not reject this hypothesis. What interpretation would place on this finding? (d) Instead of estimating the above equation suppose you wished to estimate the following dynamic panel data model: Y it = a i + γy i,t 1 + u it, i = 1, 2...N, t = 1, 2,...T. What complications does the presence of the lagged dependent variable generate? (e) Explain how you would estimate this dynamic panel data model. 12