REGRESSION ANALYSIS II- MULTICOLLINEARITY

Similar documents
Statistics for Economics & Business

Chapter 13: Multiple Regression

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Business and Economics

Chapter 11: Simple Linear Regression and Correlation

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

x i1 =1 for all i (the constant ).

Comparison of Regression Lines

January Examinations 2015

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Chapter 15 - Multiple Regression

Basic Business Statistics, 10/e

Chapter 15 Student Lecture Notes 15-1

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Scatter Plot x

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

STAT 3008 Applied Regression Analysis

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Statistics II Final Exam 26/6/18

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Negative Binomial Regression

Economics 130. Lecture 4 Simple Linear Regression Continued

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

Chapter 14 Simple Linear Regression

x = , so that calculated

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

STATISTICS QUESTIONS. Step by Step Solutions.

a. (All your answers should be in the letter!

Introduction to Regression

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Lecture 6: Introduction to Linear Regression

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Correlation and Regression

Linear Regression Analysis: Terminology and Notation

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

18. SIMPLE LINEAR REGRESSION III

Polynomial Regression Models

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

The Ordinary Least Squares (OLS) Estimator

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

First Year Examination Department of Statistics, University of Florida

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Chapter 8 Indicator Variables

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Question 1 carries a weight of 25%; question 2 carries 20%; question 3 carries 25%; and question 4 carries 30%.

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Professor Chris Murray. Midterm Exam

ECON 351* -- Note 23: Tests for Coefficient Differences: Examples Introduction. Sample data: A random sample of 534 paid employees.

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Composite Hypotheses testing

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

28. SIMPLE LINEAR REGRESSION III

Lecture 6 More on Complete Randomized Block Design (RBD)

Learning Objectives for Chapter 11

Soc 3811 Basic Social Statistics Third Midterm Exam Spring 2010

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Properties of Least Squares

Chapter 9: Statistical Inference and the Relationship between Two Variables

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Lecture 4 Hypothesis Testing

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Chapter 8 Multivariate Regression Analysis

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Testing for seasonal unit roots in heterogeneous panels

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Chapter 12 Analysis of Covariance

Continuous vs. Discrete Goods

β0 + β1xi. You are interested in estimating the unknown parameters β

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

/ n ) are compared. The logic is: if the two

e i is a random error

CHAPTER 8 SOLUTIONS TO PROBLEMS

Topic 7: Analysis of Variance

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

ANOVA. The Observations y ij

Tests of Exclusion Restrictions on Regression Coefficients: Formulation and Interpretation

STAT 511 FINAL EXAM NAME Spring 2001

Basically, if you have a dummy dependent variable you will be estimating a probability.

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Statistics MINITAB - Lab 2

( )( ) [ ] [ ] ( ) 1 = [ ] = ( ) 1. H = X X X X is called the hat matrix ( it puts the hats on the Y s) and is of order n n H = X X X X.

Transcription:

REGRESSION ANALYSIS II- MULTICOLLINEARITY

QUESTION 1 Departments of Open Unversty of Cyprus A and B consst of na = 35 and nb = 30 students respectvely. The students of department A acheved an average test score ˆ A 7.5, whle the students of department B acheved an average test score ˆ 6 B (a) If the standard devaton of the department A s known and equals to 2.5, examne the null hypothess that the average test score of the students of A equals to 8.5 versus the alternatve hypothess that t s less than 8.5 (b) If the standard devaton of the department B s unknown whle ts estmate equals to 1.5, examne the null hypothess that the average test score of the students of B equals to 5 versus the alternatve hypothess that t s more than 5

() Standard devaton s known Queston 1 We determne the null hypothess Η 0 and the alternatve hypothess Η 1 : Η 0 : μ Α = 8,5 Η 1 : μ Α < 8,5 We choose the sutable test statstc, and we calculate ts value: The sutable test statstc s: We calculate ts value: Z = μ Α 8,5 σ/ n A ~Ν(0,1) Z = μ Α 8,5 σ/ n A = 7,5 8,5 2,5/ 35 = 1 4,23 = 2,366. We determne the acceptance regon C 0 and the rejecton regon C 1 : C 0 Z : Z c and C 1 Z : Z < c.

Queston 1 We choose the sgnfcance level α, whch determnes the probablty of commttng Type I error: P Z < c; H 0 s vald = α. Based on the last relatonshp, we fnd from the standard normal dstrbuton table the crtcal value c: Level of sgnfcance Crtcal value 1% -2,33 5% -1,64 10% -1,28 Decson: we have found that Z = 2,366. At level of sgnfcance 1% the crtcal value s c = 2,33. We have Z < c, so we reject Η 0. At level of sgnfcance 5% the crtcal value s c = 1,65. We have Z < c, so we reject Η 0. At level of sgnfcance 10% the crtcal value s c = 1,28. We have Z < c, so we reject Η 0.

(b) Queston 1 The standard devaton σ s unknown. We determne the null hypothess Η 0 and the alternatve hypothess Η 1 : Η 0 : μ Β = 5 Η 1 : μ Β > 5 We choose the sutable test statstc, and then we calculate ts value: The sutable test statstc s the t-statstc: t = μ Β 5 s Β / n B ~St(n Β 1). Then we calculate ts value: t = μ Β 5 s Β / n B = 6 5 = 1,0 = 0,122. 1,5 30 8,22 We determne the acceptance regon C 0 and the rejecton regon C 1 : C 0 t : t c και C 1 t : t > c.

Queston 1 We select the sgnfcance level α, whch s the probablty to commt Type I error: P t > c; H 0 s vald = α. From the last equaton we fnd the crtcal value c from the Student s t dstrbuton table wth degrees of freedom equal to 30-1 = 29. Snce we have 29 degrees of freedom we get: Level of sgnfcance Crtcal value 1% 2,462 5% 1,699 10% 1,311 Make a Decson: We have found that t = 0,122. At level of sgnfcance 5% the crtcal value s c = 1,699. We have t < c, so we accept Η 0. At level of sgnfcance 1% the crtcal value s c = 2,462. We have t < c, so we accept Η 0. At level of sgnfcance 10% the crtcal value s c = 1,311. We have t < c, so we accept Η 0.

Queston 2 An economst evaluates the relatonshp between 5 economc varables. He wants to estmate the multple regresson: yt 1 2xt 3zt 4rt 5mt u t =1,2,, 105. The estmaton output s: yˆ t 0,172 0,264 x (2,604) (0,205) t 0,623z (0,190) 0,195r (0,097) 0,222m (0,131) where standard errors are presented n parentheses. (a) Examne the followng hypotheses: () H0 : 2 0 () H0 : 4 0 H1 : 4 0 () H : 0 H : at least one of,,, 0 2 3 4 5 1 2 3 4 5 (v) H : 0 H : at least one of,, 0 0 1 2 4 1 1 2 4 The sgnfcance level s 5%. We are gven the followng two models: (1) yˆ t t t t t t, s R 2 0.219, RSS 51196, 28 22,740, 0,184 z 0,250m R 2 0, 189 s 22, 957 RSS 53232, 45 (0,098) (0,122) (2) yˆ t 0,163 zt 0,210 xt R 2 0, 116 s 25, 834 RSS 52144, 12 (0,075) (0,114) 0 t

() We determne the null hypothessη 0 H 0 : 2 0 We choose the sutable test statstc, and then we compute ts value: The sutable test statstc s the t-statstc: t = β 2 β 2 ~St(n k). SE(β 2 ) Ts the number of observatons, so we haven = 105, whleks the number of the parameters n the regresson (k = 5, β 1, β 2, β 3, β 4, β 5 ). Also, SE(β 2 )s the standard error of the regressorβ 2. The value of the t-statstc s computed as: t = 0,264 0 0,205 = 1,290. We calculate thep-value as the degree of support ofη 0 : p value = P t t ; H 0 s vald = 2 P t t ; H 0 s vald = = 2 P t 1.290; H 0 s vald = 2 0,10 = 0,20. Thep value = 0,20s very large (larger than 10%) so there s strong support of the null hypothess H 0 : 2 0.Thus,coeffcent 2 s statstcally nsgnfcant (statstcally s equal to zero).

() Determne the null hypothess Η 0 and the alternatve hypothess Η 1 : Η 0 : β 4 = 0 Η 1 : β 4 < 0 We choose the sutable test statstc and then we compute ts value: The sutable test statstc s the t-statstc: t = β 4 β ~St(n k), SE(β 4 ) where SE(β 4 ) s the standard error of β 4 and k, s the number of the parameters of the regresson. The test statstc s calculated as t = 0,195 0 0,097 = 2.01. We determne the acceptance regon C 0 and the rejecton regon C 1 : C 0 t : t c and C 1 t : t < c.

We select the sgnfcance level α, whch represents the probablty of commttng Type Ι error: P t < c; H 0 s vald = α =>1 P t c; H 0 s vald = α => P t c; H 0 s vald = 1 α Based on the last equaton we get the crtcal value c from the t dstrbuton table t n k = t 105 5 = t 100. Sgnfcance level Crtcal value 5% -1.660 Make a decson: We accept Η 0 when the value of the t-statstc, t * «falls» nto the acceptance regon C 0, whle we reject Η 0 and accept Η 1 when t * «falls» ntothe rejecton regon C 1. At sgnfcance level 5% the crtcal valuec = 1,660. But we fnd that t = 2.01 < c. So we reject Η 0 and accept Η 1.

() We want to examne whether all parameter coeffcents, except the constant, are smultaneously equal to zero. We determne the null hypothess Η 0 and the alternatve hypothess Η 1 : Η 0 : β 2 = β 3 = β 4 = β 5 = 0 Η 1 : at least one ofβ 2, β 3, β 4, β 5 0 We choose the sutable test statstc and then we compute ts value: The sutable test statstc s the F-statstc F = R2 (n k) ~F(k 1, n k), 1 R 2 (k 1) The value of the F-statstc s calculated as F = 0,219 (105 5) 1 0,219 (5 1) = 0,280 25 = 7. We determne the acceptance regon C 0 and the rejecton regon C 1 : C 0 F : F c and C 1 F : F > c.

We select the sgnfcance levelα, whch represents the probablty of commttng TypeΙ error: P F > c; H 0 s vald = α From the last equaton we fnd the crtcal value c from the F dstrbuton tablef k 1, n k = F 5 1,105 5 = F(4,100). (We usedf(4,120)becausef(4,100)does not exst n the tables). Level of sgnfcance 5% 2,4472 Crtcal Value Make a decson: We acceptη 0 when the value of the F-statstc, F * «falls» ntothe acceptance regonc 0, whle we rejectη 0 and acceptη 1 whenf * «falls» nto the rejecton regonc 1. For sgnfcance level 5% the crtcal valuec = 2,4472. We fnd thatf = 7 > c. So we rejectη 0 (acceptη 1 ).

(v) We want to examne whether parameter coeffcents β1, β2 and β4 are smultaneously equal to zero. If we set smultaneously these coeffcent restrctons, we get the followng restrcted verson of our basc regresson model: y t 3 zt 5 m t u t Thus, between model 1 and 2, we choose model 1 because t corresponds to the restrcted model specfcaton. Estmaton of the restrcted model yelds the followng results: yˆ t 0,184 z (0,098) t 0,250m (0,122) t, R 2 0.189, s 22.957, RSS 53232. 45 Snce the new coeffcent of determnaton has been calculated for the model under the restrctons we have: 2 0.189 R R

We determne the null hypothess Η 0 and the alternatve hypothess Η 1 : Η 0 : β 1 = β 2 = β 4 = 0 Η 1 : at least one ofβ 1, β 2, β 4 0 We choose the sutable test statstc, and then we calculate ts value. The sutable test statstc s the F-statstc F = R 2 2 U R R /m 2 ~F(m, n k). 1 R U /(n k) The value of the F-statstc s calculated as: F = 0,219 0,189 1 0,219 (105 5) 3 = 1,28. We determne the acceptance regon C 0 and the rejecton regon C 1 : C 0 F : F c and C 1 F : F > c.

We choose the level of sgnfcance α, whch represents the probablty of commttng TypeΙ error: P F > c; H 0 s vald = α Based on the last equaton we fnd the crtcal value c from the F dstrbuton table F m, n k = F(3,105 5). Level of Sgnfcance 5% 2,6802 Crtcal Value Make a decson: We accept Η 0 when the value of the F-statstc, F * «falls» nto the acceptance regon C 0, whle we reject Η 0 and accept the alternatve hypothess Η 1 when F * «falls» nto the rejecton regon C 1. For sgnfcance level 5%, the crtcal value s c = 2,6802. We fnd that F = 1,28 < c. So we accept Η 0 (and reject Η 1 ).

Queston 3 A random sample of 1,562 persons was asked to respond on a scale from one (strongly dsagree) to seven (strongly agree) to the queston: Wll the new government economc polcy lower unemployment?. The sample mean response was 4.27 and the populaton standard devaton was 1.32. Test whether the mean response s equal to 4 aganst the alternatve that t s dfferent than 4. Perform the hypothess test at level 5%.

We determne the null hypothess Η 0 and the alternatve hypothess Η 1 : Η 0 : μ = 4 Η 1 : μ 4 We choose the sutable test statstc, and we calculate ts value. The sutable test statstc s the z-statstc because we have a large sample sze: Z = μ 4 σ/ n A ~Ν(0,1) We calculate ts value: Z = μ 4 = 4.27 4 σ/ n A 1.32/ 1562 = 8.08. We determne the acceptance regon C 0 and the rejecton regon C 1 : C 0 Z : Z < c and C 1 Z : Z c.

We choose the sgnfcance level α, whch determnes the probablty of commttng Type I error: * P( Z c; H0 s vald ) 2* P( Z P( Z * * c; H c; H 0 0 s vald ) s vald ) / 2 c z / 2 Based on the last relatonshp, we fnd from the standard normal dstrbuton table the crtcal value c: Level of sgnfcance Crtcal value 5% 1.96 Make a decson: we have found that Z = 8.08. At level of sgnfcance 5%, the crtcal value s c = 1.96. We have Z > c, so we reject Η 0.

Queston 4 Much research n appled economcs focuses on the prcng of goods/servces. One common approach nvolves buldng a model n whch the prce of a good depends on specfc characterstcs of that good. A real estate agent n Canada s nterested n buldng a prcng model for house prces. An approach s to estmate a multple regresson model, where the sales prce of the house n Canadan dollars s the dependent varable Y, whle varous determnants of house prces are used as ndependent varables. Factors whch affect the house prces are the followng: X1 = the lot sze of the property (n square feet) X2 = the number of bedrooms X3 = the number of bathrooms X4 = the number of storeys (excludng the basement). X5 = basement (f the house has a basement) X6 = ar condtonng system (f the house ncludes an ar condtoner) X7 = garage (number of rooms used for storage of vehcles)

Data on the housng market of Wndsor of Canada sale prce lot sze bedroom bath storeys basement ar cond garage 42000 5850 3 1 2 1 0 1 38500 4000 2 1 1 0 0 0 49500 3060 3 1 1 0 0 0 60500 6650 3 1 2 0 0 0 61000 6360 2 1 1 0 0 0 66000 4160 3 1 1 1 1 0 66000 3880 3 2 2 1 0 2 69000 4160 3 1 3 0 0 0 83800 4800 3 1 1 1 0 0 88500 5500 3 2 4 0 1 1 90000 7200 3 2 1 1 1 3 30500 3000 2 1 1 0 0 0 27000 1700 3 1 2 0 0 0 36000 2880 3 1 1 0 0 0 37000 3600 2 1 1 0 0 0 37900 3185 2 1 1 0 1 0 40500 3300 3 1 2 0 0 1 40750 5200 4 1 3 0 0 0 45000 3450 1 1 1 0 0 0 45000 3986 2 2 1 1 0 1 48500 4785 3 1 2 1 1 1 65900 4510 4 2 2 1 0 0 37900 4000 3 1 2 0 1 0 38000 3934 2 1 1 0 0 0 42000 4960 2 1 1 0 0 0 42300 3000 2 1 2 0 0 0 43500 3800 2 1 1 0 0 0 44000 4960 2 1 1 1 1 0 44500 3000 3 1 1 0 1 0 44900 4500 3 1 2 0 1 0 45000 3500 2 1 1 1 0 0 48000 3500 4 1 2 0 1 2 49000 4000 2 1 1 0 0 0 51500 4500 2 1 1 0 0 0 61000 6360 2 1 2 0 0 0 61000 4500 2 1 1 0 1 2 61700 4032 2 1 1 1 0 0 67000 5170 3 1 4 0 1 0 82000 5400 4 2 2 0 1 2 Data taken from Gary Koop s book Analyss of economc data

Queston 4 Ft the regresson model: y x x x x x x x u 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7, 1,2,...,39 Wrte the ftted regresson equaton. If we consder comparable houses, how much would an extra bathroom add to the value of the house? Examne whether the coeffcent β3 s statstcally sgnfcant at level 5%. Test whether all the determnants of the house prces are smultaneously equal to zero aganst the alternatve hypothess that at least one of them s dfferent from zero (at level 5%) Test whether the varables X2, X6 and X7 are smultaneously equal to zero aganst the alternatve hypothess that at least one of them s dfferent from zero (at level 5%)

Queston 4 Wrte the ftted regresson equaton We estmate the regresson model by usng the excel functon Regresson. The Estmaton Output s gven below: SUMMARY OUTPUT Regresson Statstcs Multple R 0.768355175 R Square 0.590369674 Adjusted R Square 0.497872504 Standard Error 11219.58763 Observatons 39 ANOVA df SS MS F Sgnfcance F Regresson 7 5624027350 8.03E+08 6.38257 0.000108491 Resdual 31 3902253547 1.26E+08 Total 38 9526280897 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 861.0030405 11021.69743 0.078119 0.938236-21617.89699 23339.90307-21617.9 23339.9 X Varable 1 6.153758317 1.727879467 3.561451 0.001215 2.629724925 9.677791709 2.629725 9.677792 X Varable 2 909.9757836 3308.240542 0.275063 0.785093-5837.225259 7657.176826-5837.23 7657.177 X Varable 3 14787.69418 6819.279422 2.168513 0.037912 879.6821577 28695.7062 879.6822 28695.71 X Varable 4 1845.305791 2923.205061 0.631261 0.532497-4116.610214 7807.221796-4116.61 7807.222 X Varable 5 3241.237021 4595.167686 0.705358 0.485854-6130.669226 12613.14327-6130.67 12613.14 X Varable 6 3650.232294 4341.864092 0.840706 0.406951-5205.05787 12505.52246-5205.06 12505.52 X Varable 7 432.3176428 3252.808054 0.132906 0.895127-6201.828097 7066.463382-6201.83 7066.463 The ftted regresson equaton s: ˆ y 861 6.15x 909.98x 14787.69x 1845.31x 3241x 3650.23x 432. 32 1 2 3 4 5 6 x 7

Queston 4 If we consder comparable houses, how much would an extra bathroom add to the value of the house? Houses wth an extra bathroom wll worth bˆ3 14787.69 Canadan dollars more than those wthout an extra bathroom, f we consder houses wth the same lot sze, number of bedrooms, storeys, basement, etc. The coeffcent estmate of varable X3 measures how much Y wll change when X3 changes one unt, gven that all the other explanatory varables reman the same. ˆb 3 In the case of smple regresson we can say that β measures the nfluence of X on Y ; n the multple regresson we say that βj measures the nfluence of Xj on Y all other explanatory varables beng equal.

Economc nterpretaton of the regresson estmates Some ways of verbally statng what the value of β1 means: An extra square foot of lot sze wll tend to add another $6.15 on to the prce of a house, ceters parbus. If we consder houses wth the same number of bedrooms, bathrooms, storeys, etc, an extra square foot of lot sze wll tend to add another $6.15 onto the prce of the house. If we compare houses wth the same number of bedrooms, bathrooms, storeys, etc, those wth larger lots tend to be worth more. In partcular, an extra square foot of lot sze s assocated wth an ncreased prce of $6.15. We cannot smply say that houses wth bgger lots are worth more snce ths s not the case (e.g. some nce houses on small lots wll be worth more than poor houses on large lots). However, we can say that f we consder houses that vary n lot sze, but are comparable n other respects, those wth larger lots tend to be worth more.

Examne whether the coeffcent β3 s statstcally sgnfcant at level 5%. The null hypothess Η 0 s H : 0 3 0 We choose the sutable test statstc, and then we compute ts value: The sutable test statstc s the t-statstc: t = β 3 β 3 ~St(n k). SE(β 3 ) T s the number of observatons, so we have n = 39, whle k s the number of the parameters n the regresson (k = 8, β 0, β 1, β 2, β 3, β 4, β 5, β 6, β 7 ). Also, SE(β 3 )s the standard error of the regressor β 3. The value of the t-statstc s computed as: t = 14787,69 0 6819,28 = 2,169. We calculate thep-value as the degree of support ofη 0 : p value = P t t ; H 0 s vald = 2 P t t ; H 0 s vald = = 2 P t 2,169; H 0 s vald = 0.038 The p value = 0,038 s very small (smaller than 5%) so there s no support of the null hypothess H : 0 0 3.Thus, the coeffcent s statstcally sgnfcant (statstcally s not 3 equal to zero).

Test whether all the determnants of the house prces are smultaneously equal to zero aganst the alternatve hypothess that at least one of them s dfferent from zero (at level 5%) We determne the null hypothess Η 0 and the alternatve hypothess Η 1 : Η 0 : β 1 = β 2 = β 3 = β 4 = β 5 = β 6 = β 7 = 0 Η 1 : at least one ofβ 1, β 2,.., β 7 0 We choose the sutable test statstc and then we compute ts value: The sutable test statstc s the F-statstc F = R2 (n k) ~F(k 1, n k), 1 R 2 (k 1) The value of the F-statstc s calculated as F = 0,59 (39 8) 1 0,59 (8 1) = 1.439 4.429 = 6.38 We determne the acceptance regon C 0 and the rejecton regon C 1 : C 0 F : F c and C 1 F : F > c.

We select the sgnfcance levelα, whch represents the probablty of commttng TypeΙ error: P F > c; H 0 s vald = α From the last equaton we fnd the crtcal value c from the F dstrbuton tablef k 1, n k = F 8 1,39 8 = F(7,31). (We used F(7,30)because F 7,31 does not exst n the tables). Level of sgnfcance 5% 2.3343 Crtcal Value Make a decson: We accept Η 0 when the value of the F-statstc, F * «falls» nto the acceptance regon C 0, whle we reject Η 0 and accept Η 1 when F * «falls» nto the rejecton regon C 1. For sgnfcance level 5% the crtcal valuec = 2,3343. We fnd that F = 6.38 > c. So we reject Η 0 (accept Η 1 ).

Test whether the varables X2, X6 and X7 are smultaneously equal to zero aganst the alternatve hypothess that at least one of them s dfferent from zero (at level 5%) If we set smultaneously these coeffcent restrctons, we get the followng restrcted verson of the regresson model: y x x x x u, 0 1 1 3 3 4 4 5 5 1,2,...,39 We estmate the new regresson model, and we get the followng results: SUMMARY OUTPUT Regresson Statstcs Multple R 0.758016749 R Square 0.574589392 Adjusted R Square 0.524541085 Standard Error 10917.5802 Observatons 39 ANOVA df SS MS F Sgnfcance F Regresson 4 5473699944 1.37E+09 11.4807 5.27081E-06 Resdual 34 4052580953 1.19E+08 Total 38 9526280897 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 1172.889855 8032.478527 0.146018 0.884769-15151.07044 17496.85015-15151.07044 17496.85015 X Varable 1 6.322892636 1.641701744 3.851426 0.000495 2.9865533 9.659231973 2.9865533 9.659231973 X Varable 2 15978.48478 5529.239705 2.889816 0.006667 4741.717813 27215.25175 4741.717813 27215.25175 X Varable 3 2646.148504 2379.351763 1.11213 0.273884-2189.276025 7481.573032-2189.276025 7481.573032 X Varable 4 3542.64637 4419.478586 0.801598 0.428352-5438.81467 12524.10741-5438.81467 12524.10741 The new coeffcent of determnaton s the restrcted coeffcent of determnaton: 2 0.5746 R R

We determne the null hypothess Η 0 and the alternatve hypothess Η 1 : Η 0 : β 2 = β 6 = β 7 = 0 Η 1 : at least one ofβ 2, β 6, β 7 0 We choose the sutable test statstc, and then we calculate ts value. The sutable test statstc s the F-statstc F = R 2 2 U R R /m 2 ~F(m, n k). 1 R U /(n k) The value of the F-statstc s calculated as: F = 0,5904 0,5746 1 0,5904 (39 8) 3 = 0.399. We determne the acceptance regon C 0 and the rejecton regon C 1 : C 0 F : F c and C 1 F : F > c.

We choose the level of sgnfcance α, whch represents the probablty of commttng TypeΙ error: P F > c; H 0 s vald = α Based on the last equaton we fnd the crtcal value c from the F dstrbuton tablef m, n k = F(3,39 8). (We used F(3,30) because F 3,31 does not exst n the tables). Level of Sgnfcance 5% 2,9223 Crtcal Value Make a decson: We accept Η 0 when the value of the F-statstc, F * «falls» nto the acceptance regon C 0, whle we reject Η 0 and accept the alternatve hypothess Η 1 when F * «falls» nto the rejecton regon C 1. For sgnfcance level 5%, the crtcal value s c = 2,9223. We fnd that F = 0.399 < c. So we accept Η 0 (and reject Η 1 ).

Ptfalls of usng multple regresson analyss In multple regresson analyss, we are usually facng two types of problems: The Effect of Includng a Varable that Ought not to be Included The Omtted varables bas The Effect of Includng a Varable that Ought not to be Included If we nclude explanatory varables that should not be present n the regresson, then the estmated coeffcents on the varables wll not be accurate In the prevous example, addng an extra bedroom to the house wll rase ts prce by $14,787.69? Probably Not! The reason s that there are many factors other than the number of bedrooms that potentally nfluence house prces. (for example, bathrooms or lot sze are more mportant determnants of house prces than bedrooms. ) Furthermore, these factors may be hghly correlated (.e. houses wth more bathrooms tend to have more bedrooms). To nvestgate the possblty, let us examne the correlaton matrx of all the varables n the model

Correlaton Matrx of the varables We calculate the correlaton coeffcent between each par of varables, and then we present the results n a matrx. For example, f we have three varables, X, Y and Z, then there are three possble correlatons (.e. ρxy, ρxz and ρyz ). Then we put these correlatons n a matrx: X Y Z X 1 Y ρxy 1 Z ρxz ρyz 1 You can use the excel functon Correlaton n the Data Analyss Toolbox to compute the correlaton matrx of the varables.

Correlaton Matrx of the varables In the house prcng regresson model, the correlaton matrx of all varables s the followng: Y X1 X2 X3 X4 X5 X6 X7 Y 1 X1 0.624731 1 X2 0.325563 0.151457 1 X3 0.567308 0.282595 0.334755 1 X4 0.312956 0.245192 0.535618 0.216612 1 X5 0.308908 0.207082 0.09987 0.364447-0.17549 1 X6 0.288224 0.177111 0.309839 0.150756 0.224507 0.040291 1 X7 0.454597 0.302247 0.336977 0.599424 0.10502 0.256325 0.397613 1 Snce all the elements of the correlaton matrx are postve, t follows that each par of varables s postvely correlated wth each other. The correlaton between the number of bathrooms and the number of bedrooms s 0.335, ndcatng that houses wth more bathrooms also tend to have more bedrooms Also note that the correlaton between the number of storeys and the number of bedrooms s 0.536, ndcatng that houses wth more storeys also tend to have more bedrooms. Snce we have found that these factors are hghly correlated wth the bedroom factor, whle t s found to be nsgnfcant, we must exclude t from the regresson model.

Multcollnearty When the explanatory varables are very hghly correlated wth each other (correlaton coeffcents ether very close to 1 or to -1) then the problem of multcollnearty occurs. Perfect multcollnearty = under Perfect Multcollnearty, the OLS estmators smply do not exst Imperfect multcollnearty Imperfect multcollnearty (or near multcollnearty) exsts when the explanatory varables n an equaton are correlated, but ths correlaton s less than perfect. In cases of mperfect multcollnearty, the OLS estmators can be obtaned However, the OLS varances are often larger than those obtaned n the absence of multcollnearty.

Detectng multcollnearty Auxlary regressons We can determne the relatonshp between any of the regressors and the other regressors by examnng each of the regressors as dependent varables, determne the R 2 values for these regressons, and usng a test to determne the relatonshp between each regressor and the set of other explanatory varables. For example we can run the followng auxlary regresson (for the varable x2): x2 0 1x1 2x3 3x4 4x5 5x6 6x7 u, 1,2,...,39 We wll set up an F test to determne f there s a hgh level of multcollnearty. So we wll test the null hypothess H 0 : 1 2 3 4 5 6 aganst the alternatve hypothess that at least one of these coeffcents s dfferent than zero. 0

Detectng multcollnearty Auxlary regressons The test statstc wll be a F statstc, whch follows an F dstrbuton wth k-2 and n-k+1 degrees of freedom, where k=the number of explanatory varables, ncludng the ntercept. If F s sgnfcant, t s taken to mean that the partcular X s collnear wth other X's; f the F value s not sgnfcant, the X (as dependent varable) s not consdered to be collnear wth the other explanatory varables. If F s sgnfcant, you may wsh to exclude that varable from the model, snce the part of the dependent varable that t s explanng s already beng explaned by the other explanatory varables. You wll have to determne f t s wse to use a more parsmonous model or not.

Detectng multcollnearty Auxlary regressons Back to our example. We run an auxlary regresson for the varable x1. Thus, we use x1 as a dependent varable, and the other regressors as ndependent varables. The estmaton output of the auxlary regresson s presented below:

Detectng multcollnearty Auxlary regressons The F statstc s not sgnfcant (ts p-value s 0.3278, much larger than 0.05), therefore the varable x1 s not consdered to be collnear wth the other explanatory varables.

Auxlary regressons Detectng multcollnearty

Auxlary regressons Detectng multcollnearty We estmated sx addtonal auxlary regressons wth the remanng varables as dependent varables. The F statstc s found to be sgnfcant for the varables bedroom, bathroom, storeys, and garage, therefore these varables are collnear wth the other explanatory varables. We may wsh to exclude these varables from the model.

Detectng multcollnearty Klen's Rule of Thumb suggests that multcollnearty may be a problem only f the R 2 obtaned from an auxlary regresson s greater than the overall R 2 (on the regresson wth y as the dependent varable). In ths example, the overall R 2 s equal to 0.59 whle the R 2 obtaned from the auxlary regressons range from 0.18 to 0.49. Therefore, accordng to Klen s rule of thumb, there s no problem of multcollnearty snce the auxlary R 2 are smaller than the overall R 2

Detectng multcollnearty Egenvalues and Condton Index We can get the egenvalues and the condton ndex to estmate the level of collnearty n the explanatory varables. Most multvarate statstcal approaches nvolve decomposng a correlaton matrx nto lnear combnatons of varables. The lnear combnatons are chosen so that the frst combnaton has the largest possble varance (subject to some restrctons we won't dscuss), the second combnaton has the next largest varance, subject to beng uncorrelated wth the frst, the thrd has the largest possble varance, subject to beng uncorrelated wth the frst and second, and so forth. The varance of each of these lnear combnatons s called an egenvalue.

Detectng multcollnearty Egenvalues and Condton Index Number stands for lnear combnaton of X varables. Egenval(ue) stands for the varance of that combnaton. The condton ndex s a smple functon of the egenvalues, namely, CI max where λ s the symbol for an egenvalue.

Detectng multcollnearty Egenvalues and Condton Index The fourth part of the matrx s the Varance proportons. Ths s the regresson coeffcent varance- decomposton matrx, whch shows the proporton of varance for each regresson coeffcent (and ts assocated varable) attrbutable to each condton ndex.

Detectng multcollnearty Egenvalues and Condton Index To use the table, you frst look at the varance proportons. For X1, for example, most of the varance (about 75 percent) s assocated wth Number 3, whch has an egenvalue of.079 and a condton ndex of 6.90. Most of the rest of X1 s assocated wth Number 4. Varable X2 s assocated wth 3 dfferent numbers (2, 3, & 4), and X3 s mostly assocated wth Number 2. Look for varance proportons about.50 and larger. Collnearty s spotted by fndng 2 or more varables that have large proportons of varance (.50 or more) that correspond to large condton ndces (between 10 and 30). There s no evdent problem wth collnearty n the above example

Detectng multcollnearty Egenvalues and Condton Index Gretl: Clck on Analyss on the estmated model and then select Collnearty : Frst, a threshold of 10 for the condton ndex selects three condton ndexes (10.446, 12.085 and 18.667) Condton ndex equal to 10.446 s assocated only to one large varance proporton (0.597- the lotsze varable); thus no collnearty s shown for ths ndex. Condton ndex equal to 12.085 s assocated wth two large varance proportons (0.38 and 0.668-bedroom and bath respectvely); thus, there s collnearty between these two varables The last condton ndex (18.667) s relatvely assocated wth the varables lotsze and bedroom (varance proportons 0.344 and 0.436) ; no collnearty exsts.

Detectng multcollnearty Varance Inflaton Factors (VIF) A varance nflaton factor (VIF) quantfes how much the varance of the estmated coeffcent s nflated. The standard errors and hence the varances of the estmated coeffcents are nflated (.e. ncreased) when multcollnearty exsts. So, the varance nflaton factor for the estmated coeffcent b k denoted VIF k s just the factor by whch the varance s nflated. The VIF for the estmated coeffcent b k s calculated as 2 R k VIF k 1 1 R where s the R 2 -value obtaned by regressng the k th predctor on the remanng predctors. A VIF of 1 means that there s no correlaton among the k th predctor and the remanng predctor varables, and hence the varance of b k s not nflated at all. The general rule of thumb s that VIFs exceedng 4 warrant further nvestgaton, whle VIFs exceedng 10 are sgns of serous multcollnearty requrng correcton. 2 k

Detectng multcollnearty Gretl: Clck on Analyss on the estmated model and then select Collnearty : VIFs In ths example, there are no sgns of serous multcollnearty because all VIFs are smaller than 10.

Resolvng multcollnearty The easest ways to cure the problems are remove one of the collnear varables transform the hghly correlated varables nto a rato go out and collect more data swtch to a hgher frequency In order to reduce the multcollnearty that exsts, t s not suffcent to go out and just collect older data observatons. The data have to be collected n such a way to ensure that the correlatons among the volatng predctors s actually reduced. That s, collectng more of the same knd of data won't help to reduce the multcollnearty.

Resolvng multcollnearty Orthogonal auxlary varables Suppose we have three auxlary varables, X1, X2, and X3, whle we have found that X1 s collnear wth X2 and X3. One way to resolve the problem s to drop X1 from the regresson. If we wsh to keep X1 n the model, together wth X2 and X3, we have to transform X1 n a way that s no longer collnear wth these varables. One way s to make X1 orthogonal to X2 and X3. How do we do t? Frst, we can run the followng auxlary regresson by least squares: X 0 1X 2 2X3 1 u Then we keep the resduals u from the prevous model, and we make the followng transformaton: ~ X 1 0 u ~ ~ where X denotes the orthogonal X1. We are now able to use n our basc 1 X1 model as a regressor.

Resolvng multcollnearty Orthogonal auxlary varables prevous example: we found that the varable garage s collnear wth the other varables. If we want to keep the varable n our model, we can make t orthogonal to the other factors. We run the regresson where garage s the dependent varable, whle the remanng predctors are used as ndependent varables: Note that the constant of the model s equal to -1.566

Resolvng multcollnearty Orthogonal auxlary varables Based on ths model, we can generate the resduals. Remember that X X X X X X Y Y u 6 6 2 2 1 1 0 7 7 7 ˆ... ˆ ˆ ˆ ˆ ˆ X X X X 6 6 2 2 1 1 0 7 ˆ... ˆ ˆ ˆ ˆ X X u 7 7 ˆ We can calculate the resduals of the model n excel:

Resolvng multcollnearty Orthogonal auxlary varables Based on ths model, we can generate the resduals. Remember that X X X X X X Y Y u 6 6 2 2 1 1 0 7 7 7 ˆ... ˆ ˆ ˆ ˆ ˆ Alternatvely, we can calculate the resduals of the model n Gretl. Select Save, and then Resduals. The seres of the resduals wll appear as a new varable (n ths case they appear as uhat1).

Resolvng multcollnearty Orthogonal auxlary varables The last step s to calculate the orthogonal varable, by usng the formula: X 7 0 u Therefore, we sum each resdual wth the estmated ntercept of the model. Excel: use a new column where you wll add the number -1.567 to the column of the resduals Gretl: select Add, then defne new varable, and type nsde the box garage_orth = -1.567 + uhat1. The new varable wll appear n the worksheet.

The Omtted varables bas If we omt explanatory varables that should be present n the regresson, then the estmated coeffcents on the ncluded varables wll be not accurate. The ntuton behnd why the omsson of varables causes bas s provded n the prevous example: lot sze s an mportant factor for house prces, and thus wants to enter nto the regresson. If we omt t from the regresson, t wll try to enter n the only way t can through ts postve correlaton wth the explanatory varable: number of bedrooms. One practcal consequence of omtted varables bas s that you should always try to nclude all those explanatory varables that could affect the dependent varable. Unfortunately, n practce, ths s rarely possble. House prces, for nstance, depend on many other explanatory varables than those found n the data set (e.g. the state of repar of the house, how pleasant the neghbors are, closet and storage space, whether the house has hardwood floors, the qualty of the garden, etc.). many of the omtted factors wll be subjectve (e.g. how do you measure pleasantness of the neghbors?).