The F distribution. If: 1. u 1,,u n are normally distributed; and 2. X i is distributed independently of u i (so in particular u i is homoskedastic)

Similar documents
Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Introduction to Econometrics. Multiple Regression (2016/2017)

Introduction to Econometrics. Multiple Regression

Hypothesis Tests and Confidence Intervals in Multiple Regression

Hypothesis Tests and Confidence Intervals. in Multiple Regression

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors

Applied Statistics and Econometrics

Applied Statistics and Econometrics

Chapter 6: Linear Regression With Multiple Regressors

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Lecture 5. In the last lecture, we covered. This lecture introduces you to

2. Linear regression with multiple regressors

Assessing Studies Based on Multiple Regression

Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Nonlinear Regression Functions

P1.T2. Stock & Watson Chapters 4 & 5. Bionic Turtle FRM Video Tutorials. By: David Harper CFA, FRM, CIPM

Review of Econometrics

Chapter 9: Assessing Studies Based on Multiple Regression. Copyright 2011 Pearson Addison-Wesley. All rights reserved.

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor)

The Simple Linear Regression Model

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0


Introduction to Econometrics. Review of Probability & Statistics

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

Chapter 8 Conclusion

ECO321: Economic Statistics II

6. Assessing studies based on multiple regression

Applied Statistics and Econometrics

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Introduction to Econometrics. Assessing Studies Based on Multiple Regression

Linear Regression with one Regressor

Regression with a Binary Dependent Variable (SW Ch. 9)


Chapter 11. Regression with a Binary Dependent Variable

Replication of Examples in Chapter 6

Econometrics Problem Set 6

Econometrics -- Final Exam (Sample)

ECON3150/4150 Spring 2016

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Multiple Regression Analysis: Heteroskedasticity

WISE International Masters

Simple Linear Regression: The Model

Applied Statistics and Econometrics

ECON3150/4150 Spring 2015

Mgmt 469. Causality and Identification

Econ Spring 2016 Section 9

8. Instrumental variables regression

1 Motivation for Instrumental Variable (IV) Regression

Lecture #8 & #9 Multiple regression

ECON Introductory Econometrics. Lecture 17: Experiments

Homoskedasticity. Var (u X) = σ 2. (23)

Introduction to Econometrics

Econometrics Problem Set 7

Final Exam - Solutions

Final Exam - Solutions

Introduction to Econometrics

Introduction to Econometrics

Introduction to Econometrics. Heteroskedasticity

4. Nonlinear regression functions

2 Prediction and Analysis of Variance

ECON3150/4150 Spring 2016

Econometrics Problem Set 6

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Econometrics (60 points) as the multivariate regression of Y on X 1 and X 2? [6 points]

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Introduction to Econometrics

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Multiple Linear Regression CIVL 7012/8012

A Practitioner s Guide to Cluster-Robust Inference

LECTURE 10: MORE ON RANDOM PROCESSES

Econometrics Homework 4 Solutions

Review of Statistics 101

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Econometrics Multiple Regression Analysis: Heteroskedasticity

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Lab 11 - Heteroskedasticity

Multiple Regression Analysis

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Econometrics Problem Set 4

Experiments and Quasi-Experiments

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

ECON 497: Lecture Notes 10 Page 1 of 1

Topic 7: HETEROSKEDASTICITY

Chapter 8 Heteroskedasticity

Finding Relationships Among Variables

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

The returns to schooling, ability bias, and regression

Regression Analysis. BUS 735: Business Decision Making and Research

Transcription:

The F distribution If: 1. u 1,,u n are normally distributed; and. X i is distributed independently of u i (so in particular u i is homoskedastic) then the homoskedasticity-only F-statistic has the F q,n-k 1 distribution, where q = the number of restrictions and k = the number of regressors under the alternative (the unrestricted model). 5-1

The F q,n k 1 distribution: The F distribution is tabulated many places When n gets large the F q,n-k 1 distribution asymptotes to the χ q /q distribution: F q, is another name for χ q /q For q not too big and n 100, the F q,n k 1 distribution and the χ q /q distribution are essentially identical. Many regression packages compute p-values of F-statistics using the F distribution (which is OK if the sample size is 100 You will encounter the F-distribution in published empirical work. 5-

Digression: A little history of statistics The theory of the homoskedasticity-only F- statistic and the F q,n k 1 distributions rests on implausibly strong assumptions (are earnings normally distributed?) These statistics dates to the early 0 th century, when computer was a job description and observations numbered in the dozens. The F-statistic and F q,n k 1 distribution were major breakthroughs: an easily computed formula; a single set of tables that could be published once, then applied in many settings; and a precise, mathematically elegant justification. 5-3

A little history of statistics, ctd The strong assumptions seemed a minor price for this breakthrough. But with modern computers and large samples we can use the heteroskedasticity-robust F- statistic and the F q, distribution, which only require the four least squares assumptions. This historical legacy persists in modern software, in which homoskedasticity-only standard errors (and F-statistics) are the default, and in which p-values are computed using the F q,n k 1 distribution. 5-4

Summary: the homoskedasticity-only ( rule of thumb ) F-statistic and the F distribution These are justified only under very strong conditions stronger than are realistic in practice. Yet, they are widely used. You should use the heteroskedasticity-robust F- statistic, with χ q /q (that is, F q, ) critical values. For n 100, the F-distribution essentially is the χ q /q distribution. For small n, the F distribution isn t necessarily a better approximation to the sampling distribution of the F-statistic only if the strong conditions are true. 5-5

The R, SER, and R for Multiple Regression (SW Section 5.10) Actual = predicted + residual: Y i = Yˆi + uˆi As in regression with a single regressor, the SER (and the RMSE) is a measure of the spread of the Y s around the regression line: SER = n 1 uˆ n k 1 i= 1 i 5-6

The R is the fraction of the variance explained: R = ESS TSS = 1 where ESS = n i= 1 ( Y i Y regressor. n ˆ ˆ ( Yi Y), SSR = i= 1 SSR, TSS n uˆ i, and TSS = i= 1 ) just as for regression with one The R always increases when you add another regressor a bit of a problem for a measure of fit The R corrects this problem by penalizing you for including another regressor: R = 1 n 1 SSR n k 1 TSS so R < R 5-7

How to interpret the R and A high R (or explain the variation in Y. A high R (or R? R ) means that the regressors R ) does not mean that you have eliminated omitted variable bias. A high R (or R ) does not mean that you have an unbiased estimator of a causal effect (β 1 ). A high R (or R ) does not mean that the included variables are statistically significant this must be determined using hypotheses tests. 5-8

Example: A Closer Look at the Test Score Data (SW Section 5.11, 5.1) A general approach to variable selection and model specification: Specify a base or benchmark model. Specify a range of plausible alternative models, which include additional candidate variables. Does a candidate variable change the coefficient of interest (β 1 )? Is a candidate variable statistically significant? Use judgment, not a mechanical recipe 5-9

Variables we would like to see in the California data set: School characteristics: student-teacher ratio teacher quality computers (non-teaching resources) per student measures of curriculum design Student characteristics: English proficiency availability of extracurricular enrichment home learning environment parent s education level 5-10

Variables actually in the California class size data set: student-teacher ratio (STR) percent English learners in the district (PctEL) percent eligible for subsidized/free lunch percent on public income assistance average district income 5-11

Digression: presentation of regression results in a table Listing regressions in equation form can be cumbersome with many regressors and many regressions Tables of regression results can present the key information compactly Information to include: variables in the regression (dependent and independent) estimated coefficients standard errors results of F-tests of pertinent joint hypotheses some measure of fit number of observations 5-1

5-13

Summary: Multiple Regression Multiple regression allows you to estimate the effect on Y of a change in X1, holding X constant. If you can measure a variable, you can avoid omitted variable bias from that variable by including it. There is no simple recipe for deciding which variables belong in a regression you must exercise judgment. One approach is to specify a base model relying on a-priori reasoning then explore the sensitivity of the key estimate(s) in alternative specifications. 5-14