Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Similar documents
Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Introduction to Econometrics. Multiple Regression (2016/2017)

Chapter 8 Conclusion

Tests of Linear Restrictions

Hypothesis Tests and Confidence Intervals in Multiple Regression

Introduction to Econometrics. Multiple Regression

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

The F distribution. If: 1. u 1,,u n are normally distributed; and 2. X i is distributed independently of u i (so in particular u i is homoskedastic)

Hypothesis Tests and Confidence Intervals. in Multiple Regression

Inference with Heteroskedasticity

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Regression and the 2-Sample t

MS&E 226: Small Data

MODELS WITHOUT AN INTERCEPT

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Linear Regression with one Regressor

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

36-707: Regression Analysis Homework Solutions. Homework 3

NC Births, ANOVA & F-tests

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Lecture 6 Multiple Linear Regression, cont.

Multiple Linear Regression. Chapter 12

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Applied Statistics and Econometrics

1 Use of indicator random variables. (Chapter 8)

ST430 Exam 2 Solutions

Multiple Regression: Example

Econometrics. 4) Statistical inference

Ch 2: Simple Linear Regression

Stat 5102 Final Exam May 14, 2015

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Applied Statistics and Econometrics

Multiple Regression Analysis

Variance Decomposition and Goodness of Fit

Inference for Regression

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Lab 11 - Heteroskedasticity

Reliability of inference (1 of 2 lectures)

School of Mathematical Sciences. Question 1

LECTURE 5 HYPOTHESIS TESTING

The Application of California School

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

General Linear Statistical Models - Part III

ST430 Exam 1 with Answers

Lecture 10: F -Tests, ANOVA and R 2

Dealing with Heteroskedasticity

8. Nonstandard standard error issues 8.1. The bias of robust standard errors

1 Multiple Regression

Psychology 282 Lecture #4 Outline Inferences in SLR

STA 101 Final Review

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

SCHOOL OF MATHEMATICS AND STATISTICS

R Output for Linear Models using functions lm(), gls() & glm()

Applied Statistics and Econometrics

Lecture 10. Factorial experiments (2-way ANOVA etc)

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

1 Motivation for Instrumental Variable (IV) Regression

A discussion on multiple regression models

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Multiple Regression Part I STAT315, 19-20/3/2014

Multiple Regression Analysis: Heteroskedasticity

Biostatistics 380 Multiple Regression 1. Multiple Regression

Nonlinear Regression Functions

Correlation and Simple Linear Regression

Heteroskedasticity. Part VII. Heteroskedasticity

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR

Statistical Inference with Regression Analysis

CAS MA575 Linear Models

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

The Simple Linear Regression Model

Comparing Nested Models

R 2 and F -Tests and ANOVA

Difference in two or more average scores in different groups

Psychology 405: Psychometric Theory

Workshop 7.4a: Single factor ANOVA

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Linear Regression Models P8111

Lecture 2 Multiple Regression and Tests

MORE ON SIMPLE REGRESSION: OVERVIEW

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

AGEC 621 Lecture 16 David Bessler

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Week 7 Multiple factors. Ch , Some miscellaneous parts

INFERENCE FOR REGRESSION

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data

Business Statistics. Lecture 10: Course Review

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

Chapter 3: Multiple Regression. August 14, 2018

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

2. Linear regression with multiple regressors

Explanatory Variables Must be Linear Independent...

P1.T2. Stock & Watson Chapters 4 & 5. Bionic Turtle FRM Video Tutorials. By: David Harper CFA, FRM, CIPM

Ch 3: Multiple Linear Regression

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

COMPARING SEVERAL MEANS: ANOVA

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

Transcription:

1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted model, we should reject the null. The difference in fit between the model under the null and the model under the alternative leads us to an intuitive formulation of the F-test statistic, for testing joint hypotheses.

Recall that a measure of fit is the sum of squared residuals: where The F-test statistic may be written as:

3 where = number of restrictions. Notice that if the restrictions are true (if the null is true), will be small, and we ll fail to reject. Another statistic which uses is the : This gives us yet another formula for the F-test statistic.

4 where: F = ( R Rrestricted ) / q (1 R ) /( n k 1) R restricted= the R for the restricted regression R = the R for the regression q = the number of restrictions under the null k = the number of regressors in the regression. The bigger the difference between the restricted and R s the greater the improvement in fit by adding the variables in question the larger is the F statistic.

Note: the textbook differentiates between homoskedasticity only and heteroskedasticity robust F-tests. We will ignore heteroskedasticity for simplicity. Example: are the coefficients on strat and exppup zero? 5 Unrestricted population regression (under H A ): score i = 0 + 1 strat i + exppup i + 3 eng i + u i Restricted population regression (that is, under H 0 ): TestScore i = 0 + 3 eng i + u i (why?) The number of restrictions under H 0 is q = (why?). The fit will be better (R will be higher) in the regression (why?) By how much must the R increase for the coefficients on strat and exppup to be judged statistically significant?

6 Restricted regression: score = 644.7 0.671eng, (1.0) (0.03) Unrestricted regression: R restricted = 0.4149 score = 649.6 0.9strat + 3.87exppup 0.656eng (15.5) (0.48) (1.59) (0.03) R = 0.4366, k = 3, q = so F = = ( R Rrestricted ) / q (1 R ) /( n k 1) (.4366.4149) / (1.4366) /(40 3 1) = 8.01

A commonly performed F-test is one which assesses whether the chosen model fits at all. 7 any one of the s not equal to zero Why shouldn t the intercept be restricted? This test will be performed by most regression software, and reported as F-test in the regression output usually along with a p-value.

8 R Code teachdata = read.csv("http://home.cc.umanitoba.ca/~godwinrt/3180/data/str.csv") attach(teachdata) restricted = lm(score ~ eng) = lm(score ~ strat + exppup + eng) Let s look at the full () model: summary()

Output: 9 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 649.5814 15.06583 4.717 < e-16 *** strat -0.86117 0.480548-0.595 0.55190 exppup 0.003867 0.00141.738 0.00644 ** eng -0.655976 0.039113-16.771 < e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 14.35 on 416 degrees of freedom Multiple R-squared: 0.4365, Adjusted R-squared: 0.434 F-statistic: 107.4 on 3 and 416 DF, p-value: <.e-16

Now, to perform the F-test of whether school spending matters or not: anova(, restricted) Output: Analysis of Variance Table 10 Model 1: score ~ strat + exppup + eng Model : score ~ eng Res.Df RSS Df Sum of Sq F Pr(>F) 1 416 85716 418 89014 - -398. 8.0034 0.0003885 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1

11 The F q,n k 1 distribution: The F distribution is tabulated many places As n, the F q,n-k 1 distribution asymptotes to the distribution: The F q, and q /q q /q distributions are the same. For q not too big and n 100, the F q,n k 1 distribution and the distribution are essentially identical. q /q Many regression packages (including STATA) compute p-values of F-statistics using the F distribution You will encounter the F distribution in published empirical work.

1 Summary F = ( R Rrestricted ) / q (1 R ) /( n k 1) The homoskedasticity-only F-statistic rejects when adding the two variables increased the R by enough that is, when adding the two variables improves the fit of the regression by enough If the errors are homoskedastic, then the homoskedasticity-only F-statistic has a large-sample distribution that is /q. But if the errors are heteroskedastic, the large-sample distribution is a mess and is not /q q q

The one at a time approach of rejecting if either of the t- statistics exceeds 1.96 rejects more than 5% of the time under the null (the size exceeds the desired significance level) For n large, the F-statistic is distributed q /q (= F q, ) The homoskedasticity-only F-statistic is important historically (and thus in practice), and can help intuition, but isn t valid when there is heteroskedasticity 13