Econometrics II Tutorial Problems No. 4

Similar documents
Properties and Hypothesis Testing

Statistical Properties of OLS estimators

Efficient GMM LECTURE 12 GMM II

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

11 Correlation and Regression

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

1 Inferential Methods for Correlation and Regression Analysis

Topic 9: Sampling Distributions of Estimators

Economics 326 Methods of Empirical Research in Economics. Lecture 18: The asymptotic variance of OLS and heteroskedasticity

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Topic 9: Sampling Distributions of Estimators

Correlation Regression

Lesson 11: Simple Linear Regression

Topic 9: Sampling Distributions of Estimators

Asymptotic Results for the Linear Regression Model

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

ECON 3150/4150, Spring term Lecture 3

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Midterm 2 ECO3151. Winter 2012

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Problem Set 4 Due Oct, 12

ARIMA Models. Dan Saunders. y t = φy t 1 + ɛ t

Machine Learning Brett Bernstein

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

(all terms are scalars).the minimization is clearer in sum notation:

MA Advanced Econometrics: Properties of Least Squares Estimators

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Algebra of Least Squares

POLS, GLS, FGLS, GMM. Outline of Linear Systems of Equations. Common Coefficients, Panel Data Model. Preliminaries

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Lecture 3. Properties of Summary Statistics: Sampling Distribution

11 THE GMM ESTIMATION

x iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0.

Simple Linear Regression

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

The standard deviation of the mean

Linear Regression Models

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Last Lecture. Wald Test

Lecture 2: Monte Carlo Simulation

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

This is an introductory course in Analysis of Variance and Design of Experiments.

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Simple Linear Regression

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

Lecture 19: Convergence

Random Variables, Sampling and Estimation

Frequentist Inference

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

A statistical method to determine sample size to estimate characteristic value of soil parameters

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Asymptotic distribution of the first-stage F-statistic under weak IVs

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

1 General linear Model Continued..

Statistical inference: example 1. Inferential Statistics

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

7.1 Convergence of sequences of random variables

Power and Type II Error

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Mathematical Notation Math Introduction to Applied Statistics

Introductory statistics

Estimation for Complete Data

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Understanding Samples

4. Partial Sums and the Central Limit Theorem

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Stat 421-SP2012 Interval Estimation Section

Expectation and Variance of a random variable

Lecture 12: September 27

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Statisticians use the word population to refer the total number of (potential) observations under consideration

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Output Analysis (2, Chapters 10 &11 Law)

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

ECO 312 Fall 2013 Chris Sims LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Final Examination Solutions 17/6/2010

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.


Transcription:

Ecoometrics II Tutorial Problems No. 4 Leart Hoogerheide & Agieszka Borowska 08.03.2017 1 Summary Gauss-Markov assumptios (for multiple liear regressio model): MLR.1 (liearity i parameters): The model is y i β 0 + β 1 x i1 + + β k x ik + u i, where β 0,..., β k are ukow parameters (costats) ad u i is a uobserved radom error term. MLR.2 (radom samplig): We have a radom sample of idepedet observatios {(x i1,..., x ik, y i ) : i 1,..., }. MLR.3 (o perfect colliearity): No exact liear relatioships betwee variables (ad oe of the idepedet variables is costat). MLR.4 (zero coditioal mea): E(u i x i1,..., x ik ) 0. MLR.5 (homoskedasticity): Var(u i x i1,..., x ik ) σ 2. Heteroskedasticity of Ukow Form: Heteroskedasticity that may deped o the explaatory variables i a ukow, arbitrary fashio. Heteroskedasticity-Robust Stadard Error: (White stadard errors) A stadard error that is (asymptotically) robust to heteroskedasticity of ukow form. Ca be obtaied as the square root of a diagoal elemet of Var( ˆβ OLS ) (X X) 1 X ˆΩX (X X) 1, where ˆΩ diag(û 2 1,..., û 2 ), the diagoal matrix with squared OLS residuals o the diagoal. Heteroskedasticity-Robust Statistic: A statistic that is (asymptotically) robust to heteroskedasticity of ukow form. E.g. t, F, LM statistics. Breusch-Paga Test: (LM test) A test for heteroskedasticity where the squared OLS residuals are regressed o exogeous variables ofte (a subset of) the explaatory variables i the model, their squares ad/or cross terms. White Test (without cross terms): A special case of Breusch-Paga Test, which ivolves regressig the squared OLS residuals o the squared explaatory variables. Weighted Least Squares (WLS) Estimator: A estimator used to adjust for a kow form of heteroskedasticity, where each squared residual is weighted by the iverse of the variace of the error. Feasible WLS (FWLS) Estimator: A estimator used to adjust for a ukow form of heteroskedasticity, where variace parameters are ukow ad therefore must first be estimated. 1

2 Extra Topics 2.1 Goldfeld Quadt (1965) test I a utshell Idea: If the error variaces are homoskedastic (equal across observatios), the the variace for oe part of the sample will be the same as the variace for aother part of the sample. Based o the ratio of variaces. Test for the equality of error variaces usig a F -test o the ratio of two variaces. Key assumptio: idepedet ad ormally distributed error terms. Divide the sample of ito three parts, the discard the middle observatios. Estimate the model for each of the two other sets of observatios ad compute the correspodig residual variaces. It requires that the data ca be ordered with odecreasig variace. The ordered data set is split i three groups: 1. the first group cosists of the first 1 observatios (with variace σ 2 1); 2. the secod group of the last 2 observatios (with variace σ 2 2); 3. the third group of the remaiig 3 1 2 observatios i the middle. This last group is left out of the aalysis, to obtai a sharper cotrast betwee the variaces i the first ad secod group. The ull hypothesis is that the variace is costat for all observatios, ad the alterative is that the variace icreases. Hece, the ull ad alterative hypotheses are H 0 : σ 2 1 σ 2 2, H 1 : σ 2 1 < σ 2 2. Apply OLS to groups 1 ad 2 separately, with resultig sums of squared residuals SSR 1 ad SSR 2 respectively ad estimated variaces s 2 1 SSR1 ad 1 k s2 2 SSR2. 2 k Uder the assumptio of idepedetly ad ormally distributed error terms: SSR j σ 2 j χ 2 j k, j 1, 2, ad these two statistics are idepedet. Therefore: SSR 2 ( 2 k)σ 2 2 SSR 1 ( 1 k)σ 2 1 s 2 2 σ 2 2 s 2 1 σ 2 1 F ( 2 k, 1 k). So, uder the ull hypothesis of equal variaces, the test statistic F s2 2 s 2 F ( 2 k, 1 k). 1 The ull hypothesis is rejected i favour of the alterative if F takes large values. There exists o geerally accepted rule to choose the umber 3 of excluded middle observatios. If the variace chages oly at a sigle break-poit, the it would be optimal to select the two groups accordigly ad to take 3 0. O the other had, if early all variaces are equal ad oly a few first observatios have smaller variace ad a few last oes have larger variace, the it would be best to take 3 large. I practice oe uses rules of thumb: e.g. 3 5 if the sample size is small ad 3 3 if is large. 2

2.2 Correctio factor for multiplicative models Recall that we distiguish two models for heteroskedasticity i the cotext of FWLS: multiplicative heteroskedasticity model additive heteroskedasticity model Var(u i x i ) σ 2 exp (δ 0 + δ 1 x i1 + + δ k x ik ) ; Var(u i x i ) δ 0 + δ 1 x i1 + + δ k x ik. The latter has, however, a disadvatage that (estimate of) Var(u i x i ) ca be egative, so we maily focus o the former oe. Notice that i the multiplicative model we have so it is equivalet with Var(u i x i ) E(ui xi)0 E(u 2 i x i ) σ 2 exp (δ 0 + δ 1 x i1 + + δ k x ik ), u 2 i σ 2 exp (δ 0 + δ 1 x i1 + + δ k x ik ) v i, v i u 2 i E(u 2 i x i) ( mea 1 radom variable) Hece, we cosider where η i is the error term ad α 0 is a costat term log(u 2 i ) α 0 + δ 1 x i1 + + δ k x ik + η i, η i log(v i ) E(log(v i )) α 0 log(σ 2 ) + δ 0 + E(log(v i )). Hece, the coefficiet δ 0 of the costat term is ot cosistetly estimated by ˆα 0 from OLS. To obtai its cosistet estimate a correctio factor is eeded so δ 0 is the estimated by ˆδ 0 + a, where, if the errors are ormally distributed (u i x i N (0, σ 2 i )), a E[log(χ 2 1)] 1.27. We will see how this works i Computer Exercise 2(i) 1. 1 Note, however, that a cosistet estimator of δ 0 is ot eeded, because exp(ˆδ 0 ) is merely a costat scalig factor that does ot affect the FWLS estimator. 3

3 Warm-up Exercises 3.1 W8/1 Which of the followig are cosequeces of heteroskedasticity? (i) The OLS estimators, ˆβ j, are icosistet. The homoskedasticity assumptio played o role i showig that the OLS estimator is cosistet. Ideed, eve with Var(u X) Ω σ 2 I we have for ˆβ OLS β + (X X) 1 X u: so the OLS estimator is still cosistet. ( ) ( X ) 1 ( X X ) u plim ˆβOLS β + plim plim ( 1 ( ) 1 β + plim x i x 1 i) plim x i u i β + E(X X) 1 E(X u) }{{}, E(XE(u X))0 (ii) The usual (homoskedasticity-oly) F statistic o loger has a F distributio. Now, we have so the usual expressio Var( ˆβ OLS ) (X X) 1 X ΩX (X X) 1, σ 2 (X X) 1 for the variace does ot apply aymore. The latter expressio is biased, which makes the stadard (homoskedasticity-oly) F test (ad t test) ivalid. Oe should use a heteroskedasticity-robust F (ad t) statistic, based o heteroskedasticity-robust stadard errors. (iii) The OLS estimators are o loger BLUE. As heteroskedasticity is a violatio of the Gauss-Markov assumptios, the OLS estimator is o loger BLUE: it is still liear, ubiased, but ot best i a sese that it is ot efficiet. Ituitively, the iefficiecy of the OLS estimator uder heteroskedasticity ca be cotributed to the fact that observatios with low variace are likely to covey more iformatio about the parameters tha observatios with high variace, ad so the former should be give more weight i a efficiet estimator (but all are weighted equally). 3.2 W8/2 Cosider a liear model to explai mothly beer cosumptio: E(u ic, price, educ, female) 0, Var(u ic, price, educ, female) σ 2 ic 2. beer β 0 + β 1 ic + β 2 price + β 3 educ + β 4 female + u, Write the trasformed equatio that has a homoskedastic error term. With Var(u ic, price, educ, female) σ 2 ic 2 we have h(x) ic 2, where h(x) is a fuctio of the explaatory variables that determies the heteroskedasticity (defied as Var(u x) σ 2 h(x)). Therefore, h(x) ic, ad so the trasformed equatio is obtaied by dividig the origial equatio by ic: beer ic β 1 0 ic + β ic 1 ic + β price 2 ic + β educ 3 ic + β female 4 ic 1 β 0 ic + β price 1 + β 2 ic + β educ 3 ic + β female 4 ic + u ic + u ic. Notice that β 1, which is the slope o ic i the origial model, is ow a costat i the trasformed equatio. This is simply a cosequece of the form of the heteroskedasticity ad the fuctioal forms of the explaatory variables i the origial equatio. 4

3.3 Small computer exercise Usig the data i the file earigs.wf1 2 ru the regressio y i β 1 d 1i + β 2 d 2i + β 3 d 3i + u i (1) where d ki, k 1, 2, 3, are dummy variables for three age groups. The test the ull hypothesis that E(u 2 i ) σ2 agaist the alterative that E(u 2 i ) γ 1 d 1i + γ 2 d 2i + γ 3 d 3i. Report p-values for both F ad R 2 tests. Recall that tests for homoskedasticity are costructed as follows: H 0 : homoskedasticity, H 1 : ot H 0, i.e. heteroskedasticity. The easiest way to perform the required test is simply to regress the squared residuals from (1) o a costat ad two of the three (to prevet colliearity) dummy variables. Notice that this gives us the same results as ruig the built-i heteroskedastisity test (Breusch-Paga-Godfrey) i EViews: The F statistic from this regressio for the hypothesis that the coefficiets of the dummy variables are zero is 5.872. It is asymptotically distributed as F (k, k 1) F (2, 4263), ad the p-value is 0.0028. A alterative statistic is R 2, which is equal to 11.72. It is asymptotically distributed as χ 2 k χ2 2, ad the p value is 0.0029. (Recall from the lecture that this is worse tha F test i fiite samples). The two test statistics yield idetical ifereces, amely, that the ull hypothesis should be rejected at ay covetioal sigificace level. 4 Problem o heteroskedasticity modellig Cosider the model y i βx i + ε i (without costat term ad with k 1), where x i > 0 for all observatios, E(ε i ) 0, E(ε i ε j ) 0, i j, ad E(ε 2 i ) σ2 i. Cosider the followig three estimators of β: b 1 x iy i, x2 i b 2 y i x, i b 3 1 y i. x i 2 Average aual earigs i 1988 ad 1989, i 1982 US dollars, for idividuals i three age groups. 5

For each estimator, derive a model for the variaces σi 2 for which this estimator is the best liear ubiased estimator of β. Recall that whe we have a model for heteroskedasticity, i.e. i Var(u i x i ) σ 2 h(x i ) the fuctio h i h(x i ) is kow, the trasformig the origial data by dividig them by h i results i a liear regressio where all Gauss-Markov assumptios are satisfied, which meas that the correspodig OLS estimator is BLUE. Cosider y i βx i + ε i, Var(u i x i ) σ 2 h i, β x i + ε ( ) i ui, Var x i σ 2, hi hi hi y i hi }{{} :y i }{{} :x i }{{} :ε i so that the correspodig OLS estimator is ˆβ OLS x i y i (x i )2 x i hi ( y i hi ) 2 hi x i x iy i h i. x 2 i h i Hece, we simply eed to fid what fuctios h i have led to the three give WLS estimators b 1 b 3. 1. To have ˆβ OLS b 1 we eed x iy i h i x 2 i h i x iy i, x2 i which meas that h i 1, i 1,..., (or h i C for ay other positive costat C, sice this would simply drop out i the umerator ad the deomiator), ad Var(u i x i ) σ 2. Notice that this is simply the OLS estimator for the homoskedastic case. 2. To have ˆβ OLS b 2 we eed x iy i h i x 2 i h i y i x, i which meas that h i x i, i 1,..., (or h i Cx i for ay other positive costat C), ad Var(u i x i ) σ 2 x i. Notice that this is a valid expressio for the variace due to the assumptio that x i > 0, i 1,...,. 3. To have ˆβ OLS b 3 we eed x iy i h i x 2 i h i 1 y i x i y i x i x i y i x i x i x 2 i x 2 i which meas that h i x 2 i, i 1,..., (or h i Cx 2 i for ay other positive costat C), ad Var(u i x i ) σ 2 x 2 i., 5 Computer Exercises Exercise 1 Simulate 100 data poits as follows. Let x i cosist of 100 radom drawigs from the stadard ormal distributio, let η i be a radom drawig from the distributio N (0, x 2 i ), ad let y i x i + η i (i.e. the true value is β 1). We will estimate the model y i βx i + ε i. 6

(i) Estimate β by OLS. Compute the homoskedasticity-oly stadard error of ˆβ OLS ad the White heteroskedasticityrobust stadard error of ˆβ OLS. (ii) Estimate β by WLS usig the kowledge that σi 2 σ2 x 2 i. Compare the estimate ad the homoskedasticityoly ad heteroskedasticity-robust stadard errors obtaied for this WLS estimator with the results for OLS i (i). We start with costructig the (correctly) trasformed series: y i : y i x i, x i : x i x i 1, ε i : ε i x i, so that ow the trasformed error terms ε i are homoskedastic. We the ru two OLS regressios o the trasformed series (oe with the homoskedasticity-oly stadard errors ad oe with the White heteroskedasticity-robust stadard errors). Not surprisigly, both give us the same results. Next, we ru two WLS regressios o the origial series, usig the correct weights, h i x 2 i (agai, oe with the homoskedasticity-oly stadard errors ad oe with the White heteroskedasticity-robust stadard errors). Notice that because ow x i ca be egative we eed to take their absolute values for weightig. As expected, the results are exactly the same as i the previous trasformed case. 7

(iii) Now estimate β by WLS usig the (icorrect) heteroskedasticity model σi 2 σ2. Compute the stadard x 2 i error of this estimate i three ways: by the WLS expressio correspodig to this (icorrect) model, by the White method for OLS o the (icorrectly) weighted data, ad also by derivig the correct formula for the stadard deviatio of WLS with this icorrect model for the variace. We start with costructig the (icorrectly) trasformed series: y i : y i x i, x i : x i x i x 2 i, ε i : ε i x i, so that ow the trasformed error terms ε i are heteroskedastic. To have a referece to the previous subpoit, we ru four regressios: two OLS oes ad two WLS oes, each time with oe with the homoskedasticity-oly stadard errors ad oe with the White heteroskedasticity-robust stadard errors. Now the ot-heteroskedasticity-robustified regressios (OLS ad WLS) give the same results, ad so do both (OLS ad WLS) with the White correctio. What is left is to derive the correct formula for the stadard deviatio of WLS uder the icorrect model for the variace. Recall that i the oe-variable (ad without a costat term) settig we have ˆβ W LS x iy i h i, x 2 i h i so with the weights h i 1 x 2 i ad usig y i βx i + ε i, we arrive at ˆβ W LS x3 i y i x4 i x3 i (βx i + ε i ) x4 i β + x3 i ε i. x4 i 8

Because ˆβ ( ) W LS is ubiased, i.e. E ˆβW LS x β, the variace of ˆβ W LS is ( ) Var ˆβW LS x [ ( ( )) ] 2 E ˆβW LS E ˆβW LS x x [ ( E β + x3 i ε 2 ] i β) x4 i x [ ( E x3 i ε 2 ] i) ( x4 i )2 x x6 i E [ ε 2 i x i] ( ) ( ) ( ) ( x4 i )2 x6 i Var [ε i x i ] ( x4 i )2 x8 i ( x4 i )2, where i ( ) we use the coditioig o x ad the fact that ε i are mutually idepedet, i ( ) the fact that E(ε i x i ) 0 ad i ( ) that Var(ε i x i ) σ 2 x 2 i x2 i. For the simulated x i we obtai x4 i 318.3814 ad x8 i 9962.1182, hece ( ) Var ˆβW LS x 9962.1182 (318.3814) 2 0.0983, so that the stadard deviatio of ˆβ W LS is 0.0983 0.3135. This shows that the stadard error from the heteroskedasticity-robust regressios of 0.22 is still estimated with some error. (iv) Perform 1000 simulatios, where the 1000 values of x i remai the same over all simulatios but the 100 values of η i are differet drawigs from the N (0, x 2 i ) distributios ad where the values of y i x i +η i differ accordigly betwee the simulatios. Determie the sample stadard deviatios over the 1000 simulatios of the three estimators of β i (i)-(iii), that is, OLS, WLS (with correct weights), ad WLS (with icorrect weights). Figure 1 preset a EViews code used for this simulatio experimet (ad also for the previous computatios). The stadard deviatios of the obtaied series of 1000 estimates for β usig the required three methods are as follows: St.dev( ˆβ OLS ) 0.1799, St.dev( ˆβ W LS,correct ) 0.0972, St.dev( ˆβ W LS,icorrect ) 0.3155. Notice that the last value is almost idetical to the theoretical oe, obtaied i (iii). (v) Compare the three sample stadard deviatios i (iv) with the estimated stadard errors i (i) (iii), ad commet o the outcomes. Which stadard errors are reliable, ad which oes are ot? The table below summarises the required results. Clearly, WLS with the correctly specified model for the variaces gives reliable stadard errors. OLS ad WLS with the icorrect weightig greatly uderestimate the variability of the estimator for β whe the heteroskedasticity-robust stadard errors are ot used. Whe the latter are applied the stadard error for both methods improve cosiderably, but still are estimated with some error. Sigle estimatio st. errors Method Homosked. oly Heterosked. robust Simulatio st. deviatios OLS 0.0956 0.1597 0.1799 WLS correct 0.0989 0.0989 0.0972 WLS icorrect 0.0895 0.2296 0.3155 9

Figure 1: EViews code example for Computer Exercise 1. 10

Exercise 2 Cosider the bak wages data bakwages.wf1 with the regressio model y i β 1 + β 2 x i + β 3 D gi + β 4 D mi + β 5 D 2i + β 6 D 3i + ε i, where y i is the logarithm of yearly wage, x i is the umber of years of educatio, D g is a geder dummy (1 for males, 0 for females), ad D m is a miority dummy (1 for miorities, 0 otherwise). Admiistratio is take as referece category ad D 2 ad D 3 are dummy variables (D 2 1 for idividuals with a custodial job ad D 2 0 otherwise, ad D 3 1 for idividuals with a maagemet positio ad D 3 0 otherwise). (i) Cosider the followig multiplicative model for the variaces: σ 2 i E[ε 2 i ] e γ1+γ2d2+γ3d3. Estimate the ie parameters (six regressio parameters ad three variace parameters) by (two-step) FWLS. Obtai the estimates of the stadard deviatios per job category ad iterpret the results. To apply (two-step) FWLS, we start by estimatig the regressio ad the model for variaces by OLS. For the latter we cosider as the explaied variable log(ˆε 2 i ), where ˆε i are the OLS residuals of from the first regressio. Keepig i mid the correctio factor for multiplicative models (assumig that ε i has a ormal distributio), we estimate the variaces as so that Pluggig i the obtaied estimates, we obtai: ˆσ 2 i exp(1.27 + ˆγ 1 + ˆγ 2 D 2i + ˆγ 3 D 3i ), ˆσ 2 1 exp(1.27 + ˆγ 1 ), ˆσ 2 2 exp(1.27 + ˆγ 1 + ˆγ 2 ), ˆσ 2 3 exp(1.27 + ˆγ 1 + ˆγ 3 ). ˆσ 2 1 exp(1.27 4.7332) 0.0313, ˆσ 2 2 exp(1.27 4.7332 0.2892) 0.0235, ˆσ 2 3 exp(1.27 4.7332 + 0.4605) 0.0497, which gives us the required stadard deviatios per job category: ˆσ 1 ˆσ 1 2 0.1769, ˆσ 2 ˆσ 2 2 0.1532, ˆσ 3 ˆσ 3 2 0.2228. As expected, the stadard deviatio is smallest for custodial jobs ad it is largest for maagemet jobs. Notice, however, that the estimates ˆγ 2 ad ˆγ 3 are ot sigificat, idicatig that the homoskedasticity of the error caot be rejected. Next, we ru WLS with weights equal to the iverse of the fitted stadard deviatio. 11

We ca see that the outcomes are quite close to those of OLS, so that the effect of heteroskedasticity is relatively small (which is i lie with the fact that we did ot reject the ull of homoskedastic error term). (ii) Next, adjust the model for the variaces as follows: E[ε 2 i ] γ 1 + γ 2 D 2 + γ 3 D 3 + γ 4 x i + γ 5 x 2 i, i.e. the model for the variaces is additive ad cotais also effects of the level of educatio. Estimate the eleve parameters (six regressio parameters ad five variace parameters) by (two-step) FWLS ad compare the outcomes with the results i (i). With the additive model we ow estimate the variaces as so that ˆσ 2 i ˆγ 1 + ˆγ 2 D 2i + ˆγ 3 D 3i + ˆγ 4 x i + ˆγ 5 x 2 i, ˆσ 2 1 ˆγ 1 + ˆγ 4 x i + ˆγ 5 x 2 i, 0.0163 + 0.0005x i + 7e-05x 2 i, ˆσ 2 2 ˆγ 1 + ˆγ 2 + ˆγ 4 x i + ˆγ 5 x 2 i 0.0163 0.0124 + 0.0005x i + 7e-05x 2 i, ˆσ 2 3 ˆγ 1 + ˆγ 3 + ˆγ 4 x i + ˆγ 5 x 2 i 0.0163 + 0.0085 + 0.0005x i + 7e-05x 2 i. 12

Notice that this time we caot obtai stadard deviatios per job category, because the estimates of stadard deviatio are idividual specific (depedig o the educatio level). However, the estimates ˆγ 2 ˆγ 5 are ot sigificat, idicatig that agai the homoskedasticity of the error caot be rejected. Below we sum up the three sets of stadard errors. Stadard errors Variable ˆβk OLS FWLS o x i FWLS with x i C 9.574694 0.054218 0.052131 0.047967 EDUC 0.044192 0.004285 0.004123 0.003885 GENDER 0.178340 0.020962 0.020345 0.020253 MINORITY -0.074858 0.022459 0.021330 0.020538 DUMJCAT2 0.170360 0.043494 0.037542 0.032217 DUMJCAT3 0.539075 0.030213 0.032882 0.032881 We ca see that chagig of the model for heteroskedasticity does ot have a big impact o the results, which are similar to those from (i). Nevertheless, the additive FWLS estimator icludig the educatio effect is somewhat more accurate tha the multiplicative, job-category-oly FWLS estimator, which is a bit more accurate tha the OLS oe. (iii) Check that the data i the data file are sorted with icreasig values of x i. Ispect the histogram of x i ad choose two subsamples to perform the Goldfeld Quadt test 3 o possible heteroskedasticity due to the variable x i. Based o the plots of x i above we choose x i 12 as the first group ad x i > 15 as the secod group, so that both groups are large eough ad so that there are some observatios dropped with 12 < x i < 15 (a few oes with x i 14). This results i 1 241, 2 225 ad 3 1 2 8 4. Ruig the origial regressio (with k 5) o both subsamples yields SSR 1 6.4217 ad SSR 2 10.4635, so that we obtai F 10.4635 6.4217 241 5 225 5 1.7627 SSR 2 2 k SSR 1 1 k 3 Sice the Goldfeld-Quadt has ot bee i the lecture slides, it will be explaied durig the tutorial. 4 You ca use the commads smpl if educ<12, scalar 1 @obssmpl ad smpl if educ>15, scalar 2 @obssmpl i EViews. 13

(with the exact values i EViews, with the above oes it is 1.7479), which uder the ull of homosekdasticity follows the F ( 2 k, 1 k) F (225 5, 241 5) F (220, 236) distributio. The correspodig p-value is 9.76E-06 so virtually 0. Hece, at ay reasoable sigificace level we reject the ull of homoskedasticity ad coclude that there is evidece for heteroskedasticy due to the educatio level. (iv) Perform the Breusch Paga test o heteroskedasticity, usig the specified model for the variaces. We still use the additive model for the variaces from (ii), i.e. we cosider R 2 from the auxiliary regressio from (ii) ˆε 2 i γ 1 + γ 2 D 2 + γ 3 D 3 + γ 4 x i + γ 5 x 2 i + η i. With R 2 0.0258, the obtaied value of the LM statistic is LM R 2 474 0.0258 12.2255, with the correspodig p-value of 0.0157 (we use the χ 2 4 distributio). Hece, at the stadard sigificace level of 5% we ca reject the ull of homoskedasticity. Alteratively, we ca ru the built-i test i EViews, where we eed to adjust the regressors i the test specificatio box, which leads to the same results. (v) Also perform the White test o heteroskedasticity. The results for the White test without ad with cross terms, respectively, are show below. The LM statistic for the White test without cross terms is equal to 13.0811 ad uder the ull it follows the χ 2 5 distributio. The correspodig p-value is 0.0226. For the White test with cross terms we obtai LM 28.7527, which follows the χ 2 14 distributio uder the ull ad yields the p-value of 0.0113. Either way we ca reject the ull of homoskedasticity at the stadard sigificace level of 5%. 14

(vi) Commet o the similarities ad differeces betwee the test outcomes i (iii) (v). The mai similarity is that all three tests rejected the ull of homoskedasticity, hece we have strog grouds to claim that the variace of the uobserved factors chages across differet segmets of the aalysed data. A differece is the exact level of the p-value: some tests may have more power to detect heteroskedasticity for this dataset (ad reject H 0 more clearly with a lower p-value). Aother differece is that the Goldfeld-Quadt test assumes that the errors are ormally distributed, whereas the Breusch-Paga ad White tests do ot rely o this assumptio. 15