Exercise sheet 6 Models with endogenous explanatory variables

Similar documents
UNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS FINAL EXAM (Type B) 2. This document is self contained. Your are not allowed to use any other material.

Exercise Sheet 4 Instrumental Variables and Two Stage Least Squares Estimation

The Multiple Regression Model Estimation

ECO375 Tutorial 8 Instrumental Variables

Problem Set # 1. Master in Business and Quantitative Methods

Econometrics I Lecture 7: Dummy Variables

Exercises (in progress) Applied Econometrics Part 1

UNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS Academic year 2009/10 FINAL EXAM (2nd Call) June, 25, 2010

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

An explanation of Two Stage Least Squares

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

Problem set 1: answers. April 6, 2018

Estimating the return to education for married women mroz.csv: 753 observations and 22 variables

Answers to Problem Set #4

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Statistical Inference. Part IV. Statistical Inference

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Intermediate Econometrics

Exercise sheet 3 The Multiple Regression Model

Model Specification and Data Problems. Part VIII

SIMULTANEOUS EQUATION MODEL

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

4. Nonlinear regression functions

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Dealing With Endogeneity

The Simple Regression Model. Part II. The Simple Regression Model

Ecmt 675: Econometrics I

Econometrics Problem Set 4

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Econ 583 Final Exam Fall 2008

11. Simultaneous-Equation Models

Multiple Regression Analysis

Econometrics Master in Business and Quantitative Methods

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Applied Microeconometrics (L5): Panel Data-Basics

Econometrics Problem Set 3

Topic 7: HETEROSKEDASTICITY

Time Series. Chapter Time Series Data

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Solution to Exercise E6.

6. Assessing studies based on multiple regression

Testing Hypothesis after Probit Estimation

10. Time series regression and forecasting

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

CHAPTER 6: SPECIFICATION VARIABLES

LECTURE 9: GENTLE INTRODUCTION TO

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

Heteroskedasticity. Part VII. Heteroskedasticity

Econ 510 B. Brown Spring 2014 Final Exam Answers

Lecture 8: Instrumental Variables Estimation

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Statistical methods for Education Economics

Econometrics - Slides

2. Linear regression with multiple regressors

Applied Quantitative Methods II

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

Making sense of Econometrics: Basics

THE MULTIVARIATE LINEAR REGRESSION MODEL

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

Linear Models in Econometrics


Linear Regression. Junhui Qian. October 27, 2014

Exam D0M61A Advanced econometrics

Final Exam. Economics 835: Econometrics. Fall 2010

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Ordered Response and Multinomial Logit Estimation

Heteroscedasticity 1

Introduction to Panel Data Analysis

Instrumental Variables

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

14.32 Final : Spring 2001

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

PhD/MA Econometrics Examination. January, 2015 PART A. (Answer any TWO from Part A)

Applied Econometrics Lecture 1

7. Integrated Processes

Problem set - Selection and Diff-in-Diff

Handout 12. Endogeneity & Simultaneous Equation Models

Applied Health Economics (for B.Sc.)

Rockefeller College University at Albany

Econometrics Problem Set 11

ECON Introductory Econometrics. Lecture 13: Internal and external validity

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

Truncation and Censoring

1 Motivation for Instrumental Variable (IV) Regression

ECO375 Tutorial 9 2SLS Applications and Endogeneity Tests

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Transcription:

Exercise sheet 6 Models with endogenous explanatory variables Note: Some of the exercises include estimations and references to the data files. Use these to compare them to the results you obtained with Gretl. 1. Recall that in the simple lineal model Y = β 0 + β 1 X + ε, when X is endogenous but we have an instrument Z that can be correlated with ε, we have that p lím β 1 = β 1 + C(Z, ε) C(Z, X) (estimator simple VI) p lím β 1 = β 1 + C(X, ε) V (X) (estimator OLS) Assume that σ X = σ ε, so that the population variation in the error term is the same as that of X. Suppose that the instrumental variable Z, is slightly correlated with ε, ρ (Z, ε) = 0,1. Suppose also that Z and X have a somewhat stronger correlation, ρ (X, Z) = 0,2. a) What is the asymptotic bias in the IV estimator that uses Z as an instrument? b) How much correlation would have to exist between X and ε so that the OLS estimator is more asymptotically biased than the previous IV estimator? 2. [Problem 16.10 in Wooldridge Textbook] Use the data in MROZ.gdt and consider the example of women labor supply. Let hours be the total annual hours worked by a woman, wage is the wage rate per hour, educ years of education, age is age in years, kidslt6 is the number of kids a woman has under the age of 6, nwifeinc, is the household income (excluding womens wage income ), exper is years of work experience, motheduc and fatheduc the years of education of her mother and father respectively. a) Estimate the labor supply equation ln (hours) = α 1 ln (wage) + β 10 + β 11 educ + β 12 age + β 13 kidslt6 + β 14 nwifeinc + u 1 by 2SLS using exper and exper 2 as instruments for log (wage), and compare the result with that obtained with hours as the dependant variable. b) In the labor supply equation in part (a), it is possible that educ is endogenous, given that ability is omitted. Use motheduc and f atheduc as instruments for educ. Now there are 2 endogenous explanatory variables in this equation. César Alonso (UC3M) 1

c) Test the overidentifying restrictions in the 2SLS estimation from part (b). What can you conclude about the validity of the instruments? 3. We would like to study the returns of education (ED) to wages (W ). We are interested in knowing on the effect of years of education on wages. We have a sample of 3010 US young men in 1976 from the National Longitudinal Survey of Young Men (NLSYM) of the National Longitudinal Surveys in the USA for the year 1976. For each individual, we observe ED (Years of education), EX (Experience, in years), EX 2 (Experience squared), W HIT E (Binary variable that takes the value of 1 if the young men is white and 0 otherwise). The following model is considered to analyze the return to education: ln W = β 0 + β 1 ED + β 2 EX + β 3 EX 2 + ε 1 (1) Notice that the error term ε 1 may include unobserved factors not included in the model that can affect wages. In particular, ABIL (ability), which is unobserved. We also an additional variable: N EAR is a dummy variable that takes value 1 if the individual lived close to a university, and zero otherwise, for which we know that C (NEAR, ε 1 ). a) Estimate model (S1) by OLS, using the subsample of white young men (i.e., restricting to W HIT E = 1). b) Suppose that ABIL is actually a relevant variable, and C (ABIL, ED) 0, while it is uncorrelated with the remaining explanatory variables in (E1). Will the OLS estimate of β 1 in (E1) consistently estimate the causal effect of education on wages? Justify. c) Consider the use of NEAR as an instrument. Estimate the auxiliary equation ED = π 0 + π 1 EX + π 2 EX 2 + π 3 NEAR + v Given the assunptions and the estimations above, can we assert that NEAR is a valid instrument for ED? Justify. d) Estimate model (S1) by 2SLS, using the subsample of white young men, and NEAR as instrument for ED. e) Is ED is exogenous? Justify. Given the results, choose the appropriate estimate of the causal effect of education on wages and interpret it. f ) Consider now the full sample of both white and non-white young men. To account for ethnical differences, we consider the equation ln W = β 0 + β 1 ED + β 2 EX + β 3 EX 2 + β 4 W HIT E + β 5 (ED W HIT E) + ε 2 (2) Propose instruments for the two potentially endogenous variables, ED and (ED W HIT E). Estimate their corresponding first-stage equations and check the validity of the instruments. g) Estimate (E2) by OLS and 2SLS and compare the results. Are ED and (ED W HIT E) exogenous? What estimates would you choose? Justify. 4. In accordance with Friedman s Permanent Income Theory, Y i = α + βx i (3) César Alonso (UC3M) 2

where Y i is permanent consumption and X i is permanentíncome. Instead of observing the permanentes variables, we observe Y i = Yi + u i X i = Xi + v i where Y i, X i measure with error (u i v i ) the corresponding variables of interest Y i, X i. Using the observed variables, we can write the consumption function as Y i = α + β (X i v i ) + u i = α + βx i + (u i βv i ) a) Suppose that E(u i ) = E(v i ) = 0, V ar(u i ) = σu, 2 V ar(v i ) = σv, 2 Cov(Yi u i) = 0, Cov(Xi v i) = 0, Cov(u i Xi ) = Cov(v iyi ) = Cov(v iu i ) = 0. Show that the OLS estimator of β, ˆβ, converges in probability to β 1 + (σ 2 v/σ 2 X ) b) Comment the sign of the asymptotic bias of ˆβ. 5. The argument that inflation boosts economic growth has been neglected by empirical crosssection studies. Such studies regress the real income (GDP) growth on the inflation rate. Nevertheless, the variables measuring inflation and real income, X i and Y i, are subject to measurement error. Assume that, actually, there exists a exact relationship between real income growth, Y i, and the true inflation rate, X i income growth, Wi income growth. Then = X i + Y i. Also, it is assumed that the nominal is correctly measured, so it can be used to measure real Y i = W i X i X i = X i + ε i ε i iid(0, σ 2 ε) a) Derive the probabilistic limit of the OLS estimator of Y on X. b) Given the result above, what can be said about the future empirical results neglecting teha tinflation boosts growth? 6. Consider the following specification for demand and supply of wine in a country q D i q S i = α 1 p i + α 2 y i + u i1 = βp i + u i2 q D i = q S i = q i where, for municipality i, q i is the wine consumption per household, p i is the relative price of wine and y i is the income per household. In the system above, price and demanded quantity are simultaneously determined, while the variable y i is exogenous. All the variables are in logs. For a sample of size 1000, we have obtained the following statistics: i p2 i = 42 i p iq i = 5 i p iy i = 12 i y2 i = 10 i y iq i = 3 2 i q2 i = 11 Suppose that we are concerned with estimating a supply equation. César Alonso (UC3M) 3

a) Compute the OLS estimator of β e and interpret the coefficient. Is the estimator consistent? Justify. b) Is y i a valid instrument for p i?. Justify. Compute the instrumental-variables estimator of β using such instrument. 7. A company selling sport goods is interested on evaluating the impact of clientsíncomes on its sales. For such purpose, it undertakes a survey for a sample of individuals who buy sport goods and considers the following specification: expend = β 0 + β 1 inc + β 2 age + β 3 age 2 + β 4 gender + β 5 bach + β 6 south + β 7 weight + β 8 weight gender + β 9 south gender + ε (C) where for each individual, expend is his/her annual expenditure in sport goods (in thousand euros), inc is his/her annual income (in thousand euros), age is his/her age in years, gender is a binary variable which takes value 1 if the individual is a woman and 0 otherwise, bach denotes the individual s marital status, taking value 1 if bachelor and 0 otherwise, south is a binary variable which takes value 1 if the individual lives in the south and 0 otherwise, weight is his/her weight in kilograms. Furthermore, income can be correlated with unobserved characteristics which, in turn, might affect expenditure in sport goods. If that is the case, then C(inc, ε) 0. The remaining covariates are not expected to be correlated with the error. In addition to the aforementioned covariates, there is information about the years of education of the individual (educ) and the years of education of his/her father (educp), which are also assumed to be uncorrelated with any unobserved characteristic affecting the expenditure in sport goods (ε). It is expected that individuals who are more educated or with more educater parents will have higher income. The following estimates have been obtained with Gretl (notice that some results may have been omitted): OUTPUT 1: OLS estimates using the 935 observations 1 935 Dependent variable: expend Coefficient Std. Error t-ratio p-value const 37,7486 31.5318 1,1972 0.2316 inc 0,6671 0.6138 1,0869 0.2774 age 0,3401 1.9090 0,1782 0.8586 age2 0,0036 0.0014 gender 2,2428 0.6031 3,7190 0.001 bach 0,7775 0.002 south 0,1803 0.5536 0,3257 0.7447 weight 0,1025 0.0512-2.0023 0.0455 weight gender 0,0323 0.1506 0,2144 0.8303 south gender 0,0692 1.4793 0,0468 0.9627 César Alonso (UC3M) 4

Mean dependent var 0.043929 S.D. dependent var 0.007224 Sum squared resid 0.047795 S.E. of regression 0.007188 R 2 0.019495 Adjusted R 2 0.009955 F (9, 925) 2.043539 P-value(F ) 0.032059 Log-likelihood 3292.838 Akaike criterion 6565.675 Schwarz criterion 6517.270 Hannan Quinn 6547.218 NOTA: El R 2 de una estimación similar a la anterior omitiendo gender, weight gender y south gender es 0,009495 OUTPUT 2: OLS estimates using the 935 observations 1 935 Dependent variable: inc Coefficient Std. Error t-ratio p-value const 1,2033 1.8148 0,6631 0.5075 age 0,0577 0.1103 0,5233 0.6010 age2 0,0006 0.0017 0,3597 0.7192 gender 0,1523 0.0995 1,5305 0.1263 bach 0,1739 0.0436 3,9930 0.0001 south 0,0589 0.0315 1,8688 0.0620 weight 0,0035 0.0029 1,2100 0.2267 weight gender 0,0165 0.0102 1,6209 0.1055 south gender 0,0977 0.0966 1,0111 0.3123 educp 0,0138 0.0047 2,9464 0.0033 educ 0,0470 0.0068 6,9315 0.0000 Mean dependent var 0.975920 S.D. dependent var 0.405896 Sum squared resid 99.17078 S.E. of regression 0.368579 R 2 0.186567 Adjusted R 2 0.175424 F (10, 730) 16.74310 P-value(F ) 1.79e 27 Log-likelihood 306.2997 Akaike criterion 634.5994 Schwarz criterion 685.2874 Hannan Quinn 654.1416 NOTA: El R 2 de una estimación similar a la anterior omitiendo educp y educ es 0,1717 OUTPUT 3: OLS estimates using the 935 observations 1 935 Dependent variable: expend César Alonso (UC3M) 5

Coefficient Std. Error t-ratio p-value const 48,0705 35.4399 1,3564 0.1754 inc 0,4594 0.2069 2,2199 0.0267 age 0,4097 2.1563 0,1900 0.8494 age2 0,055 0.0324 1,6959 0.0899 gender 0,5713 1.9915 0,2869 0.7743 bach 0,2548 0.9057 0,2813 0.7786 south 0,1037 0.6348 0,1634 0.8703 weight 0,1139 0.0568 2,0047 0.0454 weight gender 0,2221 0.2004 1,1082 0.2681 south gender 1,6185 1.8847 0,8587 0.3908 resid inc 6,3914 2.1915 2,9164 0.0036 Mean dependent var 0.043929 S.D. dependent var 0.007224 Sum squared resid 0.037615 S.E. of regression 0.007178 R 2 0.028753 Adjusted R 2 0.015448 F (10, 730) 2.161070 P-value(F ) 0.018389 Log-likelihood 2506.447 Akaike criterion 5034.894 Schwarz criterion 5085.582 Hannan Quinn 5054.436 OUTPUT 4: TSLS,using the 935 observations 1 935 Dependent variable: expend Instrumented: inc Instruments: const age age2 gender bach south weight weight gender south gender feduc educ Coefficient Std. Error z p-value const 48,0705 37.2741 1,2896 0.1972 inc 4,5944 2.1767 2,1107 0.0348 age 0,4097 2.2679 0,1806 0.8566 age2 0,0055 0.0341 0,1620 0.8713 gender 0,5713 2.0946 0,2727 0.7850 bach 0,2548 0.9526 0,2675 0.7891 south 0,1037 0.6676 0,1553 0.8766 weight 0,1139 0.0598 1,9060 0.0566 weight gender 0,2221 0.2107 1,0537 0.2920 south gender 1,6185 1.9822 0,8165 0.4142 Mean dependent var 0.043929 S.D. dependent var 0.007224 Sum squared resid 0.041666 S.E. of regression 0.007549 R 2 0.000006 Adjusted R 2-0.012306 F (9, 731) 1.546437 P-value(F ) 0.127654 César Alonso (UC3M) 6

a) In the baseline model, can we assert that E (expend inc, age, gender, bach, south, weight) = L (expend inc, age, gender, bach, south, weight)? Justify. b) Given the available evidence and the assumptions above, can be establish that educp and educ are valid instruments for inc? Justify your answer, indicating the evidence in which you base it. c) Can we assert that inc is endogenous? Justify. d) We want to test whether the expenditure in sport goods is independent of gender. Write the null and the alternative hypotheses in terms of the model parameters. Explain how you would build the test statistic and, if there is the information required, implement the test. Otherwise, explain what information is missing. César Alonso (UC3M) 7