Assumptions of the error term, assumptions of the independent variables

Similar documents
Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Single and multiple linear regression analysis

ECON 497: Lecture 4 Page 1 of 1

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Iris Wang.

Diagnostics of Linear Regression

Regression Diagnostics Procedures

Econometrics Part Three

Multiple Regression Analysis

LECTURE 11. Introduction to Econometrics. Autocorrelation

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

AUTOCORRELATION. Phung Thanh Binh

DEMAND ESTIMATION (PART III)

CHAPTER 6: SPECIFICATION VARIABLES

Multiple Linear Regression

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Statistics for Managers using Microsoft Excel 6 th Edition

Econometrics. 9) Heteroscedasticity and autocorrelation

Regression Models - Introduction

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Using the Regression Model in multivariate data analysis

Quantitative Methods I: Regression diagnostics

Regression Analysis. BUS 735: Business Decision Making and Research

Introduction to statistical modeling

Regression Models - Introduction

ECON 4230 Intermediate Econometric Theory Exam

Assumptions of classical multiple regression model

Making sense of Econometrics: Basics

FinQuiz Notes

Checking model assumptions with regression diagnostics

Math 3330: Solution to midterm Exam

Heteroskedasticity and Autocorrelation

The Model Building Process Part I: Checking Model Assumptions Best Practice

ECON 497: Lecture Notes 10 Page 1 of 1

INTRODUCTORY REGRESSION ANALYSIS

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Econometrics - 30C00200

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Inference with Simple Regression

Inferences for Regression

Correlation Analysis

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Regression Analysis By Example

Multiple Regression Analysis

Chapter 5. Classical linear regression model assumptions and diagnostics. Introductory Econometrics for Finance c Chris Brooks

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

STAT 4385 Topic 06: Model Diagnostics

Christopher Dougherty London School of Economics and Political Science

2 Prediction and Analysis of Variance

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

Econometrics Summary Algebraic and Statistical Preliminaries

Multiple Regression Analysis

Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.

Econ107 Applied Econometrics

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Multivariate Regression

VARIANCE ANALYSIS OF WOOL WOVEN FABRICS TENSILE STRENGTH USING ANCOVA MODEL

y response variable x 1, x 2,, x k -- a set of explanatory variables

Eco and Bus Forecasting Fall 2016 EXERCISE 2

Types of Statistical Tests DR. MIKE MARRAPODI

STAT Checking Model Assumptions

Econ 510 B. Brown Spring 2014 Final Exam Answers

Basic Business Statistics 6 th Edition

Leonor Ayyangar, Health Economics Resource Center VA Palo Alto Health Care System Menlo Park, CA

Ch.10 Autocorrelated Disturbances (June 15, 2016)

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

Advanced Quantitative Methods: Regression diagnostics

Chapter 16. Simple Linear Regression and dcorrelation

STAT 212 Business Statistics II 1

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Multiple linear regression S6

Macroeconometrics. Christophe BOUCHER. Session 4 Classical linear regression model assumptions and diagnostics

Basic Statistics Exercises 66

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Introduction to Econometrics. Heteroskedasticity

Chapter 4: Regression Models

VII. Serially Correlated Residuals/ Autocorrelation

ECON 497 Midterm Spring

Course information EC2020 Elements of econometrics

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Applied Econometrics. Applied Econometrics. Applied Econometrics. Applied Econometrics. What is Autocorrelation. Applied Econometrics

M.Sc. (Final) DEGREE EXAMINATION, MAY Final Year STATISTICS. Time : 03 Hours Maximum Marks : 100

Chapter 16. Simple Linear Regression and Correlation

Polynomial Regression

1 Graphical method of detecting autocorrelation. 2 Run test to detect autocorrelation

NATCOR Regression Modelling for Time Series

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Phd Program in Transportation. Transport Demand Modeling. MODULE 2 Multiple Linear Regression

Linear Models, Problems

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Agricultural and Applied Economics 637 Applied Econometrics II

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

1. The OLS Estimator. 1.1 Population model and notation

Introductory Econometrics

Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12

Transcription:

Petra Petrovics, Renáta Géczi-Papp Assumptions of the error term, assumptions of the independent variables 6 th seminar

Multiple linear regression model Linear relationship between x 1, x 2,, x p and y Y depends on: x 1, x 2,, x p p independent variables The error term (ε) β 0, β 1,, β p regression coefficients Y = β 0 + β 1 x 1 + β 2 x 2 + + β p x p +ε

Assumptions of the error term The expected value of the error term equals 0 E(ε X 1, X 2, X p )=0 Constant variance (homoscedasticity) Var(ε) = 2 The error term is uncorrelated across observations. Normally distributed error term.

Assumptions of the independent variables Linear independency. Fix values, which do not change sample by sample. There is no scale error. The independent variable is uncorrelated with the error term.

Standard linear regression model When the abovementioned assumptions are met If the sample data does not meet the assumptions, more complex models and estimation procedures are required.

SPSS y - turnover x 1 -property x 2 -number of emloyees 1 35 54 98 2 27 52 120 3 42 50 95 4 47 58 145 5 53 82 184 6 45 72 106 7 61 120 240 8 58 108 175 9 65 92 165 10 77 122 202

Assumptions of the error term The expected value of the error term equals 0 E(ε X 1, X 2, X p )=0 Constant variance (homoscedasticity) Var(ε) = 2 The error term is uncorrelated across observations. Normally distributed error term.

1. M(ε) = 0 The positive and negative values offset each other. If different from 0, the reason may be that we missed a significant explanatory variable. It is difficult to verify in practice. If we assume that the least squares method is valid, then this condition is met.

Assumptions of the error term The expected value of the error term equals 0 E(ε X 1, X 2, X p )=0 Constant variance (homoscedasticity) Var(ε) = 2 The error term is uncorrelated across observations. Normally distributed error term.

2. Homoscedasticity (Var(ε) = 2 ) the variance of the error term is the same for all observations. Testing: o o Plots of residuals versus independent variables (or predicted value ŷ or time) Statistic tests Goldfeld-Quandt test, (Especially when the hetescedasticity is related to one of the independent variables.)

Graphical tests for homoscedasticity e e e x i ŷ x i ŷ x i ŷ Homoscedastic residuals Heteroscedastic residuals e residual

SPSS Analyze / Regression / Linear - Plots Dependent variable Standardized predicted value Standardized residuum Deleted residuum Adjusted predicted value Studentized residuum Studentized deleted residuum Relationship of standardized predicted value (ZPRED) and standardized residuum (ZRESID) Homoscedasticity?

Output Variance of residuum ~constant Homoscedasticity

If it is heteroscedastic LOGARITHM! Transform/Compute variable

Assumptions of the error term The expected value of the error term equals 0 E(ε X 1, X 2, X p )=0 Constant variance (homoscedasticity) Var(ε) = 2 The error term is uncorrelated across observations. Normally distributed error term.

The error term is uncorrelated across observations In case of cross-sectional, data the observations meet the assumption of simple random sampling, thus we do not have to test this hypothesis. before making estimations according to time series data, we need to determine the residual autocorrelation.

Causes of autocorrelation if we did not use every important descriptive variables in the model (we can t recognise the effect, no data, short time series) if the model specification is wrong i.e.: the relationship is not linear, but we use linear regression not random scaling errors

Plots to detect autocorrelation e t e t Independent variable there is no in the equation. e t-1 e t-1 e We sholud to use other type of function. t

The Durbin-Watson test H 0 : ρ = 0 no autocorrelation H 1 : ρ 0 autocorrelation +violatoró autocorrelation - violator autocorrelation 0 d l d u 2 4-d u 4-d l 4 No problem d n t 2 Limits: ( e t n t 1 e e 2 t t 1) 0 d 4 Positive autocorrelation: 0 d 2 Negative autocorrelation : 2 d 4 Weaker problem: no decision Use more variable Use larger database 2

A Durbin-Watson test decision table H 1 Accept H 0 :p=0 Reject p>0 Positive autocorrelatio n p<0 Negative autocorrelatio n No decision d>d u d<d l d l <d<d u d<4-d u d>4-d l 4-d l <d<4-d u Source: Kerékgyártó-Mundruczó [1999]

Durbin-Watson statistics (5% significance level) Source: Statisztikai képletgyűjtemény n m = 1 m = 2 m = 3 m = 4 m = 5 d L d U d L d U d L d U d L d U d L d U 15 1,08 1,36 0,95 1,54 0,82 1,75 0,69 1,97 0,56 2,21 16 1,10 1,37 0,98 1,54 0,86 1,73 0,74 1,93 0,62 2,15 17 1,13 1,38 1,02 1,54 0,90 1,71 0,78 1,90 0,67 2,10 18 1,16 1,39 1,05 1,53 0,93 1,69 0,82 1,87 0,71 2,06 19 1,18 1,40 1,08 1,53 0,97 1,68 0,86 1,85 0,75 2,02 20 1,20 1,41 1,10 1,54 1,00 1,68 0,90 1,83 0,79 1,99 21 1,22 1,42 1,13 1,54 1,03 1,67 0,93 1,81 0,83 1,96 22 1,24 1,43 1,15 1,54 1,05 1,66 0,96 1,80 0,86 1,94 23 1,26 1,44 1,17 1,54 1,08 1,66 0,99 1,79 0,90 1,92 24 1,27 1,45 1,19 1,55 1,10 1,66 1,01 1,78 0,93 1,90 25 1,29 1,45 1,21 1,55 1,12 1,66 1,04 1,77 0,95 1,89 26 1,30 1,46 1,22 1,55 1,14 1,65 1,06 1,76 0,98 1,88 27 1,32 1,47 1,24 1,56 1,16 1,65 1,08 1,76 1,01 1,86 28 1,33 1,48 1,26 1,56 1,18 1,65 1,10 1,75 1,03 1,85 29 1,34 1,48 1,27 1,56 1,20 1,65 1,12 1,74 1,05 1,84 30 1,35 1,49 1,28 1,57 1,21 1,65 1,14 1,74 1,07 1,83 31 1,36 1,50 1,30 1,57 1,23 1,65 1,16 1,74 1,09 1,83 32 1,37 1,50 1,31 1,57 1,24 1,65 1,18 1,73 1,11 1,82 33 1,38 1,51 1,32 1,58 1,26 1,65 1,19 1,73 1,13 1,81 34 1,39 1,51 1,33 1,58 1,27 1,65 1,21 1,73 1,15 1,81 35 1,40 1,52 1,34 1,58 1,28 1,65 1,22 1,73 1,16 1,80 36 1,41 1,52 1,35 1,59 1,29 1,65 1,24 1,73 1,18 1,80 37 1,42 1,53 1,36 1,59 1,31 1,66 1,25 1,72 1,19 1,80 38 1,43 1,54 1,37 1,59 1,32 1,66 1,26 1,72 1,21 1,79 39 1,43 1,54 1,38 1,60 1,33 1,66 1,27 1,72 1,22 1,79 40 1,44 1,54 1,39 1,60 1,34 1,66 1,29 1,72 1,23 1,79 50 1,50 1,59 1,46 1,63 1,42 1,67 1,38 1,72 1,34 1,77 60 1,55 1,62 1,51 1,65 1,48 1,69 1,44 1,73 1,41 1,77 70 1,58 1,64 1,55 1,67 1,52 1,70 1,49 1,74 1,46 1,77 80 1,61 1,66 1,59 1,69 1,56 1,72 1,53 1,74 1,51 1,77 90 1,63 1,68 1,61 1,70 1,59 1,73 1,57 1,75 1,54 1,78 100 1,65 1,69 1,63 1,72 1,61 1,74 1,59 1,76 1,57 1,78

SPSS Analyze / Regression / Linear - Statistics

0 d l d u 2 4-d u 4-d l 4 0,95 1,54 2,46 3,05 1,381 d l <d<d u no decison We need to include more variables / Or need to increase the number of observations!

Assumptions of the error term The expected value of the error term equals 0 E(ε X 1, X 2, X p )=0 Constant variance (homoscedasticity) Var(ε) = 2 The error term is uncorrelated across observations. Normally distributed error term.

Testing: Plots Normally distributed errors Quantitative tests- Goodness-of-fit tests Chi square 2 test Kolmogorov-Smirnoff test

Graphical testing e z A plot of the values of the residuals against normal distributed values. The assumption is not violated when the figure is nearly linear.

Goodness-of-fit test H 0 : P r (ε j ) = P j (the distribution is normal) H 1 : J j : P r (ε j ) P j r 2 2 ( f ) i 1 npi np i H 0 2 (1 ),( r 1 b )

SPSS Analyze / Regression / Linear - Plots Dependent variable Standardized predicted value Standardized residuum Deleted residuum Adjusted predicted value Studentized residuum Studentized deleted residuum Histogram

Output The bell-shaped normal distribution with a mean of 0 and standard deviation 1. Approximately normal (but not definitely)

2 nd solution Analyze / Regression / Linear - SAVE

Nonparametric Test Analyze / Nonparametric Test / 1-Samle K-S... H 0 - normális eloszlás H 1 - nem normális eloszlás

Output If the significance level (p) is smaller than 5% (0,05), we reject the null hypothesis. Now it is higher than 0,05, which means we accept the normal distribution.

3 rd solution Graphs / Histogram - Display normal curve

Assumptions of the independent variables Linear independency. Fix values, which do not change sample by sample. There is no scale error. The independent variable is uncorrelated with the error term.

Testing: Multicollinearity X j =f(x 1, X 2,,X j-1, X j+1,,x p ) regression models: Multiple determination coefficient F-test(F>F krit ) VIF- indicator

VIF-measure Variance Inflation Factor 1 VIF VIF=1 with the others) VIF if R j2 =0 (jth independent variable doesn t correlate R j2 =1 (jth independent variable is an exact linear combination of other independent variables) VIF 1 VIF 2 - weak multicollinearity 2 5 j 1 1 R VIF 5 - strong disturbing multicollinearity VIF - very strong, harmful multicollinearity 2 j

Correction for Multicollinearity We should find the offending independent variables to exclude them. We can combine independent variables which are strongly (creating principle components), which will differ from the original independents, but it will contain the information content of the original ones.

SPSS Analyze / Regression / Linear - Statistics

Thank You For Your Attention stgpren@uni-miskolc.hu