Course Econometrics I

Similar documents
Econometrics Multiple Regression Analysis: Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity

Multiple Regression Analysis: Heteroskedasticity

Introductory Econometrics

ECO375 Tutorial 7 Heteroscedasticity

Heteroskedasticity (Section )

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Intermediate Econometrics

Multiple Regression Analysis

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Lab 11 - Heteroskedasticity

Graduate Econometrics Lecture 4: Heteroskedasticity

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Heteroskedasticity Example

Introduction to Econometrics. Heteroskedasticity

An explanation of Two Stage Least Squares

Econometrics - 30C00200

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Course Econometrics I

Multiple Regression: Inference

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Lecture 8: Instrumental Variables Estimation

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Topic 7: Heteroskedasticity

Ecmt 675: Econometrics I

Lecture 8: Heteroskedasticity. Causes Consequences Detection Fixes

Econometrics I Lecture 7: Dummy Variables

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Problem set - Selection and Diff-in-Diff

Applied Statistics and Econometrics

the error term could vary over the observations, in ways that are related

Statistical Inference with Regression Analysis

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)

Econometrics. 9) Heteroscedasticity and autocorrelation

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Empirical Application of Simple Regression (Chapter 2)

Chapter 8 Heteroskedasticity

Applied Statistics and Econometrics

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

ECON3150/4150 Spring 2016

Control Function and Related Methods: Nonlinear Models

THE MULTIVARIATE LINEAR REGRESSION MODEL

Binary Dependent Variables

Lecture 19. Common problem in cross section estimation heteroskedasticity

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

0. Introductory econometrics

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Lab 07 Introduction to Econometrics

Lab 10 - Binary Variables

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

. *DEFINITIONS OF ARTIFICIAL DATA SET. mat m=(12,20,0) /*matrix of means of RHS vars: edu, exp, error*/

ECON3150/4150 Spring 2015

Lab 6 - Simple Regression

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Handout 12. Endogeneity & Simultaneous Equation Models

Problem 4.1. Problem 4.3

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

ECON3150/4150 Spring 2016

At this point, if you ve done everything correctly, you should have data that looks something like:

Econometrics Midterm Examination Answers

Practice exam questions

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Handout 11: Measurement Error

Introductory Econometrics

Regression #8: Loose Ends

1 Independent Practice: Hypothesis tests for one parameter:

Answer Key: Problem Set 6

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 48

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

ECON Introductory Econometrics. Lecture 13: Internal and external validity

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Econometrics Homework 1

Fortin Econ Econometric Review 1. 1 Panel Data Methods Fixed Effects Dummy Variables Regression... 7

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Problem Set 10: Panel Data

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECO220Y Simple Regression: Testing the Slope

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 54

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

Week 3: Simple Linear Regression

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

Fixed and Random Effects Models: Vartanian, SW 683

Heteroskedasticity and Autocorrelation

FTE Employment before FTE Employment after

Motivation for multiple regression

Lecture#12. Instrumental variables regression Causal parameters III

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Mediation Analysis: OLS vs. SUR vs. 3SLS Note by Hubert Gatignon July 7, 2013, updated November 15, 2013

Transcription:

Course Econometrics I 4. Heteroskedasticity Martin Halla Johannes Kepler University of Linz Department of Economics Last update: May 6, 2014 Martin Halla CS Econometrics I 4 1/31

Our agenda for today Consequences of Heteroskedasticity (H) H-robust inference Testing for H Weighted least squares estimation Form of H know; obs. with higher variance less weight Feasible generalized least squares estimation Form of H is estimated The linear probability model revisited Martin Halla CS Econometrics I 4 2/31

Definition of H The homoskedasticity assumption states that the variance of the error term, u, conditional on the explanatory vars, is constant. Var(u x 1, x 2,..., x k ) = σ 2 Put differently, the error has the same variance given any value of the explanatory vars. Whenever the variance of u changes across different values of the explanatory vars., H is present. More generally, we have H, when the error variance differs across units i Var(u i ) = σ 2 i. For example, in a savings eq. H is present, if the variance of u affecting savings increases with income (i. e. there is a higher variance among high income individuals). Martin Halla CS Econometrics I 4 3/31

Consequences of H H does not cause bias or inconsistency in OLS. Goodness-of-fit measures are also unaffected. However, usual OLS t-statistics do not have a t distribution. The same applies to F statistics and LM statistics. This is not solved by a larger sample size. We have a problem with statistical inference. Further, OLS is not BLUE; there are more efficient estimators. Intuitively, OLS puts too much weight on obs. with a high error variance. Solutions: Simplest solution: H-robust inference. Given that we know the form of H (or we can estimate it), we can derive a more efficient estimator. Martin Halla CS Econometrics I 4 4/31

H-robust inference I Good news: OLS is still useful in the presence of H Econometricians developed in the 1980s methods to adjust s.e. and t, F and LM statistics such that they are valid in the presence of H of unknown form. That means, we can provide valid inference that works in any case. The derivation of this adjusted testing statistics is quite technical, but the application is easy. Martin Halla CS Econometrics I 4 5/31

H-robust inference I White (1980, ECM) has shown that under MLR.1 through MLR.4, Var( ˆβ j ) = n i=1 ˆr2 ijû2 i SSR 2 j (1) is a valid estimator for Var( ˆβ j ), where ˆr ij denotes the i th residual from regressing x j on all other indep. vars, and SSR j is the sum of squared residual from this regression. The square root of (1) is the H-robust s.e. for ˆβ j. Sometimes also called White, Huber, Eicker s.e. (or some hyphenated combination of these names); or simply robust s.e. Robust s.e. can be either larger or smaller than usual s.e. Based on (1) we can derive a H-robust t statistic: t = (estimate hyp. value)/s.e. This can easily be done in Stata. Martin Halla CS Econometrics I 4 6/31

Example with H-robust inference. reg lwage marrmale marrfem singfem educ exper expersq tenure tenursq, robust Linear regression Number of obs = 526 F( 8, 517) = 51.70 Prob > F = 0.0000 R-squared = 0.4609 Root MSE =.39329 Robust lwage Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- marrmale.2126756.0571419 3.72 0.000.1004167.3249345 marrfem -.1982676.05877-3.37 0.001 -.313725 -.0828102 singfem -.1103502.0571163-1.93 0.054 -.2225587.0018583 educ.0789103.0074147 10.64 0.000.0643437.0934769 exper.0268006.0051391 5.22 0.000.0167044.0368967 expersq -.0005352.0001063-5.03 0.000 -.0007442 -.0003263 tenure.0290875.0069409 4.19 0.000.0154516.0427234 tenursq -.0005331.0002437-2.19 0.029 -.0010119 -.0000544 _cons.321378.109469 2.94 0.003.1063193.5364368 In Stata robust standard errors can be easily obtained by the option robust. Martin Halla CS Econometrics I 4 7/31

H-robust inference II (I) (II) Coeff. Normal Coeff. Robust s.e. s.e. marrmale 0.213*** (0.055) 0.213*** [0.057] marrfem -0.198*** (0.058) -0.198*** [0.059] singfem -0.110* (0.056) -0.110 [0.057] educ 0.079*** (0.007) 0.079*** [0.007] exper 0.027*** (0.005) 0.027*** [0.005] exper 2-0.001*** (0.001) -0.001*** [0.001] tenure 0.029*** (0.007) 0.029*** [0.007] tenure 2-0.001* (0.001) -0.001* [0.001] Constant 0.321** (0.100) 0.321** [0.109] R-squared 0.461 0.461 N 526 526 In this case usual and robust s.e. are very similar. However, in other cases it might change important conclusions. Martin Halla CS Econometrics I 4 8/31

Testing for H Why test? We could simply use only robust s.e.? Under H OLS is not BLUE. Usual t statistics have exact t distrib. under the CLM assumps. Robust s.e./statistics are only valid in large samples. There are many different tests. General idea: test assump. MLR.5: H 0 : Var(u x) = σ 2 If we cannot reject this null hyp., we conclude H is not a problem. Since Var(u x) = E(u 2 x), we can write H 0 : E(u 2 x) = σ 2. That means, we can test whether u 2 is related (in expected value) to any of the x: u 2 = δ 0 + δ 1 x 1 + δ 2 x 2 +... + δ k x k + v The null hyp. of homoskedasticity is H 0 : δ 1 =... = δ k = 0. Implementation with the estimate of u 2, the squared residual û 2. Martin Halla CS Econometrics I 4 9/31

The Breusch-Pagan test for H 1. Estimate your model by OLS and obtain û 2 i for each i. 2. Run û 2 = δ 0 + δ 1 x 1 +... + δ k x k + v and save the R-squared (Rû 2 ). 2 3. Form either the F statistic or the LM statistic F = R 2 û 2 /k (1 R 2 û 2 )/(n k 1) LM = n R 2 û 2 where k is equal to the no. of regressors and n the no. of obs. 4. If the respective p-value is sufficiently small, reject the null-hyp. of homoskedasticity. (If you suspect H only in a sub-set of your indep. vars, you can modify step 2.) Martin Halla CS Econometrics I 4 10/31

Breusch-Pagan test for H an example I * Breusch-Pagan test for heteroskedasticity (for Example 8.4) * 1.) Estimate the model by OLS and obtain the squared OLS residuals: qui reg price lotsize sqrft bdrms predict u, resid gen u2=u^2 * 2.) Run the following regression and keep the R-squared: qui reg u2 lotsize sqrft bdrms * 3a.) Form either the F (or the LM) statistic and compute the p-value: * F statistic: display (e(r2)/(1- e(r2))*(84/3)) 5.3389198 * P-value display 1-F(3,88,e(F)).00199862» A p-value of 0.002 suggests strong evidence against homoskedasticity. Martin Halla CS Econometrics I 4 11/31

Breusch-Pagan test for H an example II * 3b.) Alternatively we can compute the LM statistic * LM statistic: display e(r2)*e(n) 14.092386 * P-value display 1-chi2(3,14.092386).00278206» Again, strong evidence against the null-hyp. of homoskedasticity. Martin Halla CS Econometrics I 4 12/31

Breusch-Pagan test for H an example III Let us consider a model with log transformations of the some variable: drop u u2 qui reg lprice llotsize lsqrft bdrms predict u, resid gen u2=u^2 reg qui u2 llotsize lsqrft bdrms display (e(r2)/(1- e(r2))*(84/3)) 1.4115009 display e(f) 1.4115009 display 1-F(3,88,e(F)).24479157» Now we fail to reject the null hypothesis of homoskedasticity. Martin Halla CS Econometrics I 4 13/31

The White Test (special case) for H 1. Estimate your model by OLS and obtain the residuals and the fitted values and compute also theirs squares (û 2 i, ŷ2 i ). 2. Run û 2 = δ 0 + δ 1 ŷ + δ 2 ŷ 2 + v and save the R-squared (R 2 û 2 ). 3. Form either the F statistic of the LM statistic F 2,n 3 = R 2 û 2 /2 (1 R 2 û 2 )/(n 3) where n is the no. of obs. LM = n R 2 û 2 4. If the respective p-value is sufficiently small, reject the null-hyp. of homoskedasticity. (In the original form of the test you include in step 2. all indep vars x j, their squares x 2 j, and all their cross-products x jx h for j h.) Martin Halla CS Econometrics I 4 14/31

The special case of the White Test an example I * Special case of the White Test for heteroskedasticity (see Example 8.5) * 1.) Estimate the model by OLS and obtain the residuals and the fitted values. qui reg lprice llotsize lsqrft bdrms predict u, resid predict fitted, xb /*... and compute also their squares */ gen u2=u^2 gen fitted2=fitted^2 Martin Halla CS Econometrics I 4 15/31

The special case of the White Test an example II * 2.) Run the following regression and keep the R-squared: qui reg u2 fitted fitted2 * 3.) Form either the F (or the LM) statistic and compute the p-value: display (e(r2)/(1- e(r2))*(88/2)) 1.7939129 display 1-F(2,88,e(F)).18277339 * The p-value of 0.183 provide little evidence against homoskedasticity. * LM statistic display e(r2)*e(n) 3.4472777 display 1-chi2(2,3.4472777).17841574 Martin Halla CS Econometrics I 4 16/31

Weighted least squares estimation I Before robust s.e. were available, econometricians used a weighted least squares (WLS) estimation in the presence of H. WLS requires the knowledge of the functional form of the variance. Idea: if we can specify H (as a function of the x), the WLS estimation transforms the estimation model such that we get homoskedastic errors. Under a correct specification of the variance, WLS is more efficient than OLS, and leads to new t and F statistics with correct distributions. WLS is an example for a generalized least squares (GLS) estimation. We can also estimate the form of H before we apply WLS; this procedure is called feasible GLS (FGLS). Martin Halla CS Econometrics I 4 17/31

Weighted least squares estimation II Let x denote our RHS vars and assume Var(u x) = σ 2 h(x), (2) where h(x) is some known function that determines H. Of course, σ 2 is unknown, but we will estimate it. For instance, consider the simple savings function sav i = β 0 + β 1 inc i + u i, (3) where assume that the variance of the error is proport. to income Var(u i inc i ) = σ 2 inc i. (4) That means, as income increases the variability in savings increases. We can use this idea to estimate an eq. with heteroskedastic errors, y i = β 0 + β 1 x i1 +... + β k x ik + u i, (5) and transform it into an eq. that has a homoskedastic error term. Martin Halla CS Econometrics I 4 18/31

Weighted least squares estimation III We simply divide the original equation by h i : y i / h i = β 0 / h i + β 1 (x i1 / h i ) +... + β k (x ik / h i ) + (u i / h i ), (6) or y i = β 0 x i0 + β 1 x i1 +... + β k x ik + u i, (7) where x i0 = 1/ h i and the other starred vars denote the corresponding original vars divided by h i. Note, since Var(u i x i ) = E(u 2 i x i), we can write ( E (u i / h i ) 2) = 1 E(u 2 i ) = 1 (σ 2 h i ) = σ 2, (8) h i h i which means that the error term of the transformed eq. is homoskedastic. Given that the original eq. fulfills MLR.1-4; this eq. fulfills MLR.1-5; savings eq.: sav i / inc i = β 0 (1/ inc i ) + β 1 inci + u i ) Martin Halla CS Econometrics I 4 19/31

Weighted least squares estimation IV The OLS estimator gives equal weight to all obs. and minimizes: n = (y i β 0 β 1 x i... β k x k ) 2. (9) i=1 In the transformed model from above we minimize: n ( ) 1 1 1 1 2 y i β 0 β 1 x i... β k x k hi hi hi hi i=1 n 1 = (y i β 0 β 1 x i... β k x k ) 2 h i=1 i n = w i (y i β 0 β 1 x i... β k x k ) 2. i=1 (10) WLS gives less weight (w i = 1/h i ) to obs. with a higher error var. Martin Halla CS Econometrics I 4 20/31

Weighted least squares estimation Example 8.6 * WLS (where we assume that h=inc): * 1. option using transformed vars gen cons_wls = 1/(inc)^(1/2) gen sav_wls = sav/(inc)^(1/2) gen inc_wls = inc/(inc)^(1/2) reg sav_wls inc_wls cons_wls, nocons sav_wls Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- inc_wls.1717555.0568128 3.02 0.003.0590124.2844986 cons_wls -124.9528 480.8606-0.26 0.796-1079.205 829.2994 * 2. option using Stata s weight option reg sav inc [aw = 1/inc] (sum of wgt is 1.3877e-02) sav Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- inc.1717555.0568128 3.02 0.003.0590124.2844986 _cons -124.9528 480.8606-0.26 0.796-1079.205 829.2994 Martin Halla CS Econometrics I 4 21/31

Weighted least squares estimation V What are the properties of WLS if our choice for h(x) is incorrect? Just like OLS, WLS still provides an unbiased and consistent estimator. Note, OLS is the special case where we erroneously assumed h(x) = 1. However, the test statistics are no longer valid. Wooldridge argues that even a wrong specification of (strong) H might be better than complete ignorance (by OLS). In case of averaged data (e. g. on a firm-level or country-level) you should always use WLS with 1/h i = m i, where m i is the number of underlying individuals in the ith aggregate unit. Idea: Larger aggregate units have a smaller error variance, and receive a higher weight. Martin Halla CS Econometrics I 4 22/31

Feasible GLS Usually, the exact form of H is not obvious. How do we find the function h(x i )? Feasible GLS (FGLS) suggests to use an estimate of h i, denoted as ĥi, in the GLS transformation. FGLS is sometimes also called estimated GLS. Of course, there are many ways to model H. For instance, we could assume that Var(u x) = σ 2 exp(δ 0 + δ 1 x 1 +... + δ k x k ) That means, h(x) = exp(δ0 + δ 1 x 1 +... + δ k x k ) The exponential func. guarantees positive values (for estimated variances). Next slide outlines a corresponding feasible GLS procedure Martin Halla CS Econometrics I 4 23/31

A feasible GLS procedure to correct for H 1. Estimate your model and obtain the residual, û 2. Create log(û 2 ) 3. Run the regression of log(û 2 ) on x, and obtain the fitted values, ĝ 4. Exponentiate the fitted values: exp(ĝ) ĥ 5. Estimate your model by WLS, using weights 1/ĥ However, note FGLS is not unbiased. It is only consistent and asymptotically more efficient than OLS. Martin Halla CS Econometrics I 4 24/31

Feasible GLS estimation Example 8.7 * 1.) Estimate the model and obtain the residual: qui reg cigs lincome lcigpric educ age agesq restaurn predict u, resid * 2.) Create the log of the squared residual: gen lu2 = log(u^2) * 3.) Run the following regression and obtain the fitted values: qui reg lu2 lincome lcigpric educ age agesq restaurn predict fitted, xb * 4.) Exponentiate the fitted values: gen h = exp(fitted) * 5.) Estimate the model by WLS, using weights $1/h$ reg cigs lincome lcigpric educ age agesq restaurn [aw = 1/h] (sum of wgt is 1.9977e+01)... cigs Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- lincome 1.295241.4370118 2.96 0.003.4374154 2.153066 lcigpric -2.94028 4.460142-0.66 0.510-11.69524 5.814684 educ -.4634462.1201586-3.86 0.000 -.6993095 -.2275829 age.4819474.0968082 4.98 0.000.2919194.6719755 agesq -.0056272.0009395-5.99 0.000 -.0074713 -.0037831 restaurn -3.461066.7955047-4.35 0.000-5.022589-1.899543 _cons 5.63533 17.80313 0.32 0.752-29.31103 40.58169 Martin Halla CS Econometrics I 4 25/31

The linear probability model revisited Problem: A LPM generally contains H. Solution I: Simply compute robust s.e. Solution II: Estimate the variance and use WLS Var(y x) = p(x))[1 p(x)] Estimate by ĥi = ŷ i (1 ŷ i ). Martin Halla CS Econometrics I 4 26/31

The LPM revisited Solution I (Example 8.8) inlf Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- nwifeinc -.0034052.0014485-2.35 0.019 -.0062488 -.0005616 educ.0379953.007376 5.15 0.000.023515.0524756 exper.0394924.0056727 6.96 0.000.0283561.0506287 expersq -.0005963.0001848-3.23 0.001 -.0009591 -.0002335 age -.0160908.0024847-6.48 0.000 -.0209686 -.011213 kidslt6 -.2618105.0335058-7.81 0.000 -.3275875 -.1960335 kidsge6.0130122.013196 0.99 0.324 -.0128935.0389179 _cons.5855192.154178 3.80 0.000.2828442.8881943 Robust inlf Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- nwifeinc -.0034052.0015249-2.23 0.026 -.0063988 -.0004115 educ.0379953.007266 5.23 0.000.023731.0522596 exper.0394924.00581 6.80 0.000.0280864.0508983 expersq -.0005963.00019-3.14 0.002 -.0009693 -.0002233 age -.0160908.002399-6.71 0.000 -.0208004 -.0113812 kidslt6 -.2618105.0317832-8.24 0.000 -.3242058 -.1994152 kidsge6.0130122.0135329 0.96 0.337 -.013555.0395795 _cons.5855192.1522599 3.85 0.000.2866098.8844287 Martin Halla CS Econometrics I 4 27/31

The LPM revisited Solution II Estimating the LPM by weighted least squares 1.) Estimate the model by OLS and obtain the fitted values, ŷ i 2.) Determine whether all ŷ i are inside the interval [0,1] If so, proceed to step 3.). If not, some adjustment is needed to bring all ŷi into the unit interval 3.) Construct the estimated variances in ĥ = ŷ i(1 ŷ i ) 4.) Estimate the following eq. by WLS, using 1/ĥ y = β 0 + β 1 x 1 +... + β k x k + u. Martin Halla CS Econometrics I 4 28/31

Estimating the LPM by WLS Example 8.9 (part I) * 1.) Estimate the model by OLS and obtain the fitted values. qui reg PC hsgpa ACT parcoll predict fitted, xb * 2.) Determine whether all fitted values are inside the interval [0,1] sum fitted Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- fitted 141.3971631.1000667.1700624.4974409 * 3.) Construct the estimated variances. gen h = fitted*(1-fitted) Martin Halla CS Econometrics I 4 29/31

Estimating the LPM by WLS Example 8.9 (part II) * 4.) Estimate the following eq. by WLS, using $1/h$ reg PC hsgpa ACT parcoll [w=1/h] (analytic weights assumed) (sum of wgt is 6.2818e+02) Source SS df MS Number of obs = 141 -------------+------------------------------ F( 3, 137) = 2.22 Model 1.54663033 3.515543445 Prob > F = 0.0882 Residual 31.7573194 137.231805251 R-squared = 0.0464 -------------+------------------------------ Adj R-squared = 0.0256 Total 33.3039497 140.237885355 Root MSE =.48146 PC Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- hsgpa.0327029.1298817 0.25 0.802 -.2241292.289535 ACT.004272.0154527 0.28 0.783 -.0262847.0348286 parcoll.2151862.0862918 2.49 0.014.04455.3858224 _cons.0262099.4766498 0.05 0.956 -.9163323.9687521 Martin Halla CS Econometrics I 4 30/31

Estimating the LPM by WLS Example 8.9 (part III) Estimation by OLS: PC Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- hsgpa.0653943.1372576 0.48 0.635 -.2060231.3368118 ACT.0005645.0154967 0.04 0.971 -.0300792.0312082 parcoll.2210541.092957 2.38 0.019.037238.4048702 _cons -.0004322.4905358-0.00 0.999 -.970433.9695686 Estimation by WLS: PC Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- hsgpa.0327029.1298817 0.25 0.802 -.2241292.289535 ACT.004272.0154527 0.28 0.783 -.0262847.0348286 parcoll.2151862.0862918 2.49 0.014.04455.3858224 _cons.0262099.4766498 0.05 0.956 -.9163323.9687521 There are no important diffs. The only significant RHS var is parcoll, and in both case the estimated prob. of PC ownership is about 22 percent higher if at least one parent has attended college. Martin Halla CS Econometrics I 4 31/31