. *DEFINITIONS OF ARTIFICIAL DATA SET. mat m=(12,20,0) /*matrix of means of RHS vars: edu, exp, error*/

Similar documents
Mediation Analysis: OLS vs. SUR vs. 3SLS Note by Hubert Gatignon July 7, 2013, updated November 15, 2013

Lecture 8: Instrumental Variables Estimation

Heteroskedasticity Example

Lab 11 - Heteroskedasticity

Specification Error: Omitted and Extraneous Variables

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

8. Nonstandard standard error issues 8.1. The bias of robust standard errors

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Handout 11: Measurement Error

Binary Dependent Variables

Lecture 8: Heteroskedasticity. Causes Consequences Detection Fixes

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Lecture#12. Instrumental variables regression Causal parameters III

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Course Econometrics I

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)

Maria Elena Bontempi Roberto Golinelli this version: 5 September 2007

Statistical Modelling in Stata 5: Linear Models

Correlation and Simple Linear Regression

Lab 07 Introduction to Econometrics

Econometrics. 8) Instrumental variables

Lecture 4: Multivariate Regression, Part 2

Instrumental Variables, Simultaneous and Systems of Equations

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

Fixed and Random Effects Models: Vartanian, SW 683

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Problem Set 10: Panel Data

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Econometrics. 9) Heteroscedasticity and autocorrelation

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Sociology Exam 1 Answer Key Revised February 26, 2007

Graduate Econometrics Lecture 4: Heteroskedasticity

ECO220Y Simple Regression: Testing the Slope

General Linear Model (Chapter 4)

Week 3: Simple Linear Regression

THE MULTIVARIATE LINEAR REGRESSION MODEL

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

1 Independent Practice: Hypothesis tests for one parameter:

sociology 362 regression

Dealing With and Understanding Endogeneity

Problem set - Selection and Diff-in-Diff

ECON3150/4150 Spring 2016

Heteroskedasticity Richard Williams, University of Notre Dame, Last revised January 30, 2015

Handout 12. Endogeneity & Simultaneous Equation Models

ECON3150/4150 Spring 2016

Lecture 14. More on using dummy variables (deal with seasonality)

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Econometrics II Censoring & Truncation. May 5, 2011

Lecture 4: Multivariate Regression, Part 2

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Multiple Regression Analysis: Heteroskedasticity

ECON Introductory Econometrics. Lecture 13: Internal and external validity

Interpreting coefficients for transformed variables

Topic 7: Heteroskedasticity

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

sociology 362 regression

Econometrics Midterm Examination Answers

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Эконометрика, , 4 модуль Семинар Для Группы Э_Б2015_Э_3 Семинарист О.А.Демидова

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 48

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

Statistical Inference with Regression Analysis

Applied Econometrics. Lecture 3: Introduction to Linear Panel Data Models

Final Exam. 1. Definitions: Briefly Define each of the following terms as they relate to the material covered in class.

Problem Set 1 ANSWERS

ECON Introductory Econometrics. Lecture 16: Instrumental variables

ECON3150/4150 Spring 2015

Section Least Squares Regression

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 54

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Applied Statistics and Econometrics

Lab 10 - Binary Variables

1 The basics of panel data

Fortin Econ Econometric Review 1. 1 Panel Data Methods Fixed Effects Dummy Variables Regression... 7

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Lecture 3 Linear random intercept models

Control Function and Related Methods: Nonlinear Models

Practice 2SLS with Artificial Data Part 1

Econ 423 Lecture Notes

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

Applied Statistics and Econometrics

Spatial Regression. 3. Review - OLS and 2SLS. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Empirical Application of Simple Regression (Chapter 2)

****Lab 4, Feb 4: EDA and OLS and WLS

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

GMM Estimation in Stata

IV and IV-GMM. Christopher F Baum. EC 823: Applied Econometrics. Boston College, Spring 2014

Lecture 5. In the last lecture, we covered. This lecture introduces you to

Lecture 19. Common problem in cross section estimation heteroskedasticity

Suggested Answers Problem set 4 ECON 60303

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Transcription:

. DEFINITIONS OF ARTIFICIAL DATA SET. mat m=(,,) /matrix of means of RHS vars: edu, exp, error/. mat c=(5,-.6, \ -.6,9, \,,.) /covariance matrix of RHS vars /. mat l m /displays matrix of means / c c c3 r. mat l c /displays covariance matrix/ symmetric c[3,3] c c c3 r 5 r -.6 9 r3.. drawnorm edu exp e,n(3) means(m) cov(c) (obs 3). Compare normal and lognormal distribution. g Y=exp(logY). gr Y,bin(4) norm saving($pathc\e,replace). gr logy,bin(4) norm saving($pathc\e,replace). gr using $pathc\e $pathc\e.7757.7997 Fraction Fraction 49.8 Y 8969. 7.384 logy 9.8557

. ============= HETEROSKEDASTICITY ======================= References: Stata Reference Manual [N-R], regression diagnostic, pp.357- Stata Programming [P], _robust, pp.34 Wooldridge, Heteroskedasticity, pp.57 Kennedy, ch.8, pp.33-56. Original error w/o heteroskedasticity. reg logy edu exp exp Source SS df MS Number of obs = 4 -------------+------------------------------ F( 3, 36) =. Model 6.76933 3.9344 Prob > F =. Residual.746788 36.949898 R-squared =.36 -------------+------------------------------ Adj R-squared =.35 Total 65.4537 39.4777 Root MSE =.389 logy Coef. Std. Err. t P> t [95% Conf. Interval] edu.675944.989.66..67447.73444 exp.987.857 3.93..5683.6789 exp -.468.69-6.78. -.635 -.338 _cons 7.63638.448 7.37. 7.548485 7.748. predict res,res. g res=res^. predict logy_h (option xb assumed; fitted values). gr res logy_h,xlab ylab yline() t("no heter") saving($pathc\e3,replace). mat se=sqrt(el(e(v),,)) /sqrt(diagonal elements of V-C)=std.error of the estimator /. mat l se symmetric se[,] c r.989. hettest / test using fitted values of logy / Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of logy chi() =.9 Prob > chi =.676. hettest edu / test using edu / Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: edu chi() =.76 Prob > chi =.843. hettest,rhs / test using exp /

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: edu exp exp chi(3) = 3.34 Prob > chi =.345. Heteroskedastic error term: variance is function of edu. g e_a=sqrt(edu)e. gr e_a edu,xlab ylab yline() t("heter=f(edu)") saving($pathc\e4,replace). g logy_a=7.6+ edu.7 + exp.- exp.5 + e_a. reg logy_a edu exp exp Source SS df MS Number of obs = 4 -------------+------------------------------ F( 3, 36) = 5.94 Model 53.75964 3 7.998675 Prob > F =. Residual 4.999 36.45989 R-squared =.9 -------------+------------------------------ Adj R-squared =.5 Total 454.755 39.47664 Root MSE =.6 logy_a Coef. Std. Err. t P> t [95% Conf. Interval] edu.64565.65 6.4..4396.84869 exp.9537.98.97.33 -.9765.87699 exp -.398.375 -.68.94 -.8637.677 _cons 7.69345.5444 49.88. 7.3996 7.995884. predict logy_ah (option xb assumed; fitted values). predict res_a,res. g res_a=res_a^. gr res_a logy_ah,xlab ylab yline() t("heter=f(edu)") saving($pathc\e5,replace). hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of logy_a chi() = 6.79 Prob > chi =.. hettest edu Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: edu chi() = 3.74 Prob > chi =.. reg res_a edu,noc / Note that coefficient on edu=var(e) / Source SS df MS Number of obs = 4 -------------+------------------------------ F(, 39) = 7.55

Model 749.786 749.786 Prob > F =. Residual 5494.4973 39.56855995 R-squared =.3336 -------------+------------------------------ Adj R-squared =.333 Total 843.9334 4 3.85339 Root MSE =.67 res_a Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- edu.93363.8534 3.7..877656.98957. Heteroskedastic error term: variance =f(external var). g x=uniform() / Generate normaly distributed variable x /. g e_b=e(x+.) / Heteroskedastic error: variance =f(external variable x) /. gr e_b x,xlab ylab yline() t("heter=f(x)") saving($pathc\e6,replace). g logy_b=7.6+ edu.7 + exp.- exp.5 + e_b. reg logy_b edu exp exp Source SS df MS Number of obs = 4 -------------+------------------------------ F( 3, 36) = 66.37 Model 64.8938 3.633346 Prob > F =. Residual 69.8589 36.375483 R-squared =.486 -------------+------------------------------ Adj R-squared =.488 Total 34.74995 39.6996688 Root MSE =.885 logy_b Coef. Std. Err. t P> t [95% Conf. Interval] edu.683546.759 39.4..6499.77884 exp.76.6733 7...844.54 exp -.4879.45 -.4. -.5673 -.484 _cons 7.64937.6397 89.8. 7.57334 7.67653. hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of logy_b chi() = 3.8 Prob > chi =.5. hettest,rhs Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: edu exp exp chi(3) = 8.93 Prob > chi =.3. hettest x Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: x chi() = 85.34 Prob > chi =.

. gr using $pathc\e3 $pathc\e4 $pathc\e5 $pathc\e6 No heter Heter=f(edu) Residuals e_a 4 - - 4 8 8.5 9 Fitted values Heter=f(edu) -4 5 edu 5 Heter=f(x) Residuals - e_b.5 -.5-4 8 8.5 9 Fitted values -.5 x

. Heteroskedasticity robust estimate of coefficient V-C matrix: sandwich estimator. Heteroskedasticity robust estimate of coef. V-C matrix: sandwich estimator. reg logy_b edu exp exp,robust / Robust estimation of V-C matrix / Regression with robust standard errors Number of obs = 63 F( 3, 59) = 574.8 Prob > F =. R-squared =.453 Root MSE =.9 Robust logy_b Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- edu.67993.935 35.5..6494.779 exp.999.7577 5.3..57439.638 exp -.4475.433 -.33. -.535 -.366 _cons 7.65633.83748 69.8. 7.6388 7.7678. mat Vreg=e(V) / Robust coef. V-C matrix /. mat l Vreg symmetric Vreg[4,4] edu exp exp _cons edu 3.735e-6 exp -4.4e-8 3.9e-6 exp.573e-9-7.36e-8.878e-9 _cons -.4498 -.56 5.474e-7.853. reg logy_b edu exp exp,mse / OLS w/o robust V_C / Source SS df MS Number of obs = 63 -------------+------------------------------ F( 3, 63) =.6 Model 64.85838 3.693439 Prob > F =. Residual 78.8437 63.364577 R-squared =.453 -------------+------------------------------ Adj R-squared =.456 Total 43.749 6.664667 Root MSE = logy_b Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- edu.67993.9965 6.84..48457.873858 exp.999.956..36 -.8765.7469 exp -.4475.55 -.98.47 -.8897-5.38e-6 _cons 7.65633.4683 5.5. 7.3684 7.94394. predict res_b,res. mat D=e(V) / Non-robust V-C matrix /. mat l D symmetric D[4,4] edu exp exp _cons edu.9854 exp -.57e-6.8384 exp 5.6e-8 -.995e-6 5.83e-8 _cons -.8645 -.6989.473.5544

. matrix accum M = edu exp exp [iweight=res_b^] /Salami for the Sandwich/. mat l M symmetric M[4,4] edu exp exp _cons edu.856 exp 9469.987 3934.94 exp 47586.4 56954. 34376 _cons 957.88854 64.5539 3934.94 78.8437. mat V=e(N)/(e(N)-e(df_m)-) DMD /Sandwich X'WX /. mat l V symmetric V[4,4] edu exp exp _cons edu 3.735e-6 exp -4.4e-8 3.9e-6 exp.573e-9-7.36e-8.878e-9 _cons -.4498 -.56 5.474e-7.853 Compare martrix V and Vres. They are identical

Measurement Error in the Dependent Variable MEASUREMENT ERROR () y = β + β x + β x +... + β x u () y = y + e K K + Suppose equation () represents the population model. Instead of observing y, we observe y, or y plus measurement error (e, where E[e ]=). y = β + βx + β x +... + β K xk + ( u + e ) 443 Var ( v) σ u + σ e = (requires Cov(u, e )=) v To see the implications of measurement error in y, plug eq. () into eq. (). The OLS estimators of the β j will be affected to the extent that the composite error v is correlated with the explanatory variables. If the measurement error e is correlated with x, the OLS estimators will be biased and inconsistent. Under the classic error-in-variables assumption, Cov( y, e )=, and thus v and x are uncorrelated. Measurement Error in an Explanatory Variable (K=) (3) y = β + x + u β (4) x = x + e or Suppose instead that one of the explanatory variables is measured with error that is, we observe x instead of x in equation (3). (Again, E[e ]=). y = β + βx + ( u βe ) 443 plim ˆ Cov( x, v) β = β + Var( x ) v To see the implications of measurement error in x, plug eq. (4) into eq. (3). The OLS estimators of β will be affected to the extent that the composite error v is correlated with x. Under the classic error-in-variables assumption, Cov( x, e )=. Thus, Cov [( x + e )( u β e ] = β σ ( x, v) = E ) e Var ( x ) = Var( x ) + Var( e ) = σ x + σ e plim plim ˆ β ˆ β σ x β σ x + σ e = σ r β σ r + σ e = Under the classic error-in-variables assumption, it can be shown that the OLS estimator is inconsistent and (asymptotically) biased downward (as shown at left). The term multiplying β is called the attenuation bias (it is always <). When K> (and x is the only mismeasured variable), the attenuation bias is as shown at left ( r is the population error from the regression of x on all other explanatory variables).

. =========== ERRORS IN VARIABLES =========================. Case A: Error on logy. g error=invnorm(uniform()) / Measurement error/. g logyx=logy+.error / logy with error /. dotplot logy logyx, ny(5) saving($pathc\e7,replace). gr logy logyx logy,xlab ylab s(op) saving($pathc\e8,replace). reg logy edu exp exp / Model w/o error / Source SS df MS Number of obs = 4 -------------+------------------------------ F( 3, 36) =. Model 6.76933 3.9344 Prob > F =. Residual.746788 36.949898 R-squared =.36 -------------+------------------------------ Adj R-squared =.35 Total 65.4537 39.4777 Root MSE =.389 logy Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- edu.675944.989.66..67447.73444 exp.987.857 3.93..5683.6789 exp -.468.69-6.78. -.635 -.338 _cons 7.63638.448 7.37. 7.548485 7.748. reg logyx edu exp exp / Model with error in logy / See that edu coefficient is not changed, only std. error and R/ Source SS df MS Number of obs = 4 -------------+------------------------------ F( 3, 36) = 7.44 Model 7.6538 3 3.388343 Prob > F =. Residual 93.66 36.37675 R-squared =.93 -------------+------------------------------ Adj R-squared =.9 Total 363.886 39.69836973 Root MSE =.3744 logyx Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- edu.737.35866 9.6..6379.773463 exp.547.3476 3.36..4788.865 exp -.55.83-6.3. -.663 -.3378 _cons 7.63564.5389 4.7. 7.57877 7.795

. Case B: Stochastic error in edu. g edux=edu+error / Education years with error /. dotplot edu edux, ny(5) saving($pathc\e9,replace). gr edu edux edu,xlab(,9,3,8) ylab(,9,3,8) s(op) saving($pathc\e,replace). reg logy edu exp exp Residual.746788 36.949898 R-squared =.36 logy Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- edu.675944.989.66..67447.73444... reg logy edux exp exp / See that edu coefficient is smaller/ Source SS df MS Number of obs = 4 -------------+------------------------------ F( 3, 36) = 4.63 Model 43.783947 3 4.59436 Prob > F =. Residual.6737 36.37787 R-squared =.649 -------------+------------------------------ Adj R-squared =.638 Total 65.4537 39.4777 Root MSE =.35 logy Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- edux.388487.99 6.95..34354.433434 exp.4975.984 3.5..4658.634 exp -.4398.7-6.. -.583 -.983 _cons 7.979665.39455.7. 7.998 8.5733. corr edux error,cov / Bias ~ COV(edux,error)/VAR(eduX) / edux error -------------+------------------ edux 9.454 error.5978.99678. gr using $pathc\e7 $pathc\e8 $pathc\e9 $pathc\e

. Case C: systematic error =f(edu). g eduq=.8edu / Education years with error /. gr edu eduq edu,xlab(,9,3,8) ylab(,9,3,8) saving($pathc\e,replace). dotplot edu eduq, ny(5) saving($pathc\e,replace). reg logy edu exp exp / Pure regression / Source SS df MS Number of obs = 4 Residual.746788 36.949898 R-squared =.36 logy Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- edu.675944.989.66..67447.73444... reg logy eduq exp exp / See that edu coefficient is larger/ Source SS df MS Number of obs = 4 -------------+------------------------------ F( 3, 36) =. Model 6.7695 3.935 Prob > F =. Residual.746787 36.949898 R-squared =.36 -------------+------------------------------ Adj R-squared =.35 Total 65.4537 39.4777 Root MSE =.389 logy Coef. Std. Err. t P> t [95% Conf. Interval] eduq.84499.3786.66..7788.985 exp.987.857 3.93..5683.6789 exp -.468.69-6.78. -.635 -.338 _cons 7.63638.448 7.37. 7.548485 7.748

. gr using $pathc\e $pathc\e 8 edu eduq 9.68 3 9 9 3 8 edu 3.735 edu eduq. locpoly logy edux,plot(scatter logy edu) Local polynomial smooth Degree: logy 7 8 9 5 5 edux logy logy locpoly smooth: logy