MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Similar documents
FinQuiz Notes

LECTURE 11. Introduction to Econometrics. Autocorrelation

CHAPTER 6: SPECIFICATION VARIABLES

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Christopher Dougherty London School of Economics and Political Science

Iris Wang.

Section 2 NABE ASTEF 65

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Economics 308: Econometrics Professor Moody

WISE International Masters

Econometrics Summary Algebraic and Statistical Preliminaries

Introduction to Econometrics. Heteroskedasticity

8. Instrumental variables regression

ECON 4230 Intermediate Econometric Theory Exam

Reliability of inference (1 of 2 lectures)

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Chapter 13. Multiple Regression and Model Building

DEMAND ESTIMATION (PART III)

Chapter 4: Regression Models

Chapter 8 Heteroskedasticity

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK

Econometrics Part Three

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 17th Class 7/1/10

Review of Statistics 101

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1

Heteroskedasticity. Part VII. Heteroskedasticity

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Sociology 6Z03 Review II

1 Motivation for Instrumental Variable (IV) Regression

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

Likely causes: The Problem. E u t 0. E u s u p 0

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

LECTURE 10: MORE ON RANDOM PROCESSES

Making sense of Econometrics: Basics

2. Linear regression with multiple regressors

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Heteroskedasticity and Autocorrelation

Econ 510 B. Brown Spring 2014 Final Exam Answers

Introduction to Econometrics

FAQ: Linear and Multiple Regression Analysis: Coefficients

Introductory Econometrics

New York University Department of Economics. Applied Statistics and Econometrics G Spring 2013

Chapter 3 Multiple Regression Complete Example

Modified Variance Ratio Test for Autocorrelation in the Presence of Heteroskedasticity

Inferences for Regression

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Applied Econometrics. Applied Econometrics. Applied Econometrics. Applied Econometrics. What is Autocorrelation. Applied Econometrics

y response variable x 1, x 2,, x k -- a set of explanatory variables

Semester 2, 2015/2016

Week 11 Heteroskedasticity and Autocorrelation

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Föreläsning /31

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

ECON 497: Lecture 4 Page 1 of 1

Econometrics - 30C00200

6. Assessing studies based on multiple regression

Rockefeller College University at Albany

Statistics for Managers using Microsoft Excel 6 th Edition

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Introduction to the Analysis of Variance (ANOVA)

Empirical Economic Research, Part II

Bayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities

Chapter 4. Regression Models. Learning Objectives

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Introduction to Eco n o m et rics

What is a Hypothesis?

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

holding all other predictors constant

Finding Relationships Among Variables

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Applied Quantitative Methods II

Economics 300. Econometrics Multiple Regression: Extensions and Issues

Economics 300. Econometrics Multiple Regression: Extensions and Issues

INTRODUCTORY REGRESSION ANALYSIS

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

FinQuiz Notes

Econometric Forecasting Overview

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

STA121: Applied Regression Analysis

Chapter 7 Student Lecture Notes 7-1

26:010:557 / 26:620:557 Social Science Research Methods

Linear Regression with Time Series Data

Applied Statistics and Econometrics

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Contents. Part I Statistical Background and Basic Data Handling 5. List of Figures List of Tables xix

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Least Squares Estimation-Finite-Sample Properties

Final Exam - Solutions

Correlation Analysis

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

FNCE 926 Empirical Methods in CF

Transcription:

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance 1. INTRODUCTION Multiple linear regression models are more sophisticated. They incorporate more than one independent variable. = Critical F taken from F Distribute Table = Null Hypothesis = Alternative Hypothesis X = Independent Variable Y = Dependent Variable F = F Statistic (calculated) 2. MULTIPLE LINEAR REGRESSIONS Allows determining effects of more than one independent variable on a particular dependent variable = + + + + Tells the impact on Y by changing X 1 by 1 unit keeping other independent variables same. Individual slope coefficients (e.g. b 1) in multiple regressions known as partial regression/slope coefficients. 2.1 Assumption of the Multiple Linear Regression Model Relationship b/w Y and,,, is linear. Independent variables are not random and no exact linear relationship exists b/w 2 or more independent variables. Expected value of error terms is 0. Variance of error term is same for all observations. Error term is uncorrelated across observations. Error term is normally distributed. 2.2 Predicting the Dependent Variable in a Multiple Regression Model Obtain estimates of regression parameters. = ^, ^, ^, ^ =,,, Determine assumed values of, Compute predicted value of using = + + + + To predict dependent variable: Be confident that assumptions of the regression are met. Predictions regarding X must be within reliable range of data used to estimate the model. 2.3 Testing Whether All Population Regression Coefficients Equals Zero All slope coefficients are simultaneously = 0, none of the X variable helps explain Y. To test F-test is used. T-test cannot be used. = = / /( ( ))

Page 2 2.3 Testing Whether All Population Regression Coefficients Equals Zero Where = = n = no. of observation k = no. of slope coefficients Decision rule reject if F > F C (for given α). It is a one-tailed test. d f numerator =k d f denominator =n-(k+1). For k and n the test statistic representing H 0, all slope coefficients are equal to 0, is, ( ) In F-distribution table, ( ) where K represents column and n- (k+1) represents row. Significance of F in ANOVA table represents p value. F-statistic chances of Type I error. 2.4 Adjusted R 2 R 2 with addition of independent variables (X) in regression = 1 1. When k 1 > can be ve but R 2 is always +ve. If is used for comparing regression models. Sample size must be the same Dependent variable is defined in the same way. Does not necessarily indicate regression is well specified. 3. USING DUMMY VARIABLES IN REGRESSION Dummy variable takes 1 if particular condition is true & 0 when it is false. Diligence is required in choosing no. of dummy variables. Usually n-1 dummy variables are used where n= no. of categories.

Page 3 4. VIOLATIONS OF REGRESSION ASSUMPTIONS 4.1 Heteroskedasticity Variance of errors differs across observations heteroskedastic Variance of errors is similar across observations homoskedastic Usually no systematic relationship exists b/w X & regression residuals. If systematic relationship is present heteroskedasticity can exist. 4.1.1 The Consequence of Heteroskedasticity It can lead to mistake in inference. Does not affect consistency. F-test becomes unreliable. Due to biased estimators of standard errors, t-test also becomes unreliable. Most likely result of heteroskedasticity is that the: estimated standard errors will be underestimated. t-statistic will be inflated. Ignoring heteroskedasticity leads to significant relationship that does not exist actually. It becomes more serious while developing investment strategy using regression analysis. Unconditional heteroskedasticity when heteroskedasticity of error variance is not correlated with independent variables in the multiple regression. Create major problems for statistical inference. Conditional heteroskedasticity when heteroskedasticity of error variance is correlated with the independent variables. It causes most problems. Can be tested & corrected easily through many statistically software packages. 4.1.2 Testing for Heteroskedasticity Breush-Pagan test is widely used. Regression squared residuals of regression on independent variables. Independent variables explain much of the variation of errors conditional heteroskedasticity exists. = no conditional heteroskedasticity exists. = conditional heteroskedasticity exist Under Breush-pagan test statistic = nr 2 R 2 : from regression of squared residuals on X Critical value calculated χ 2 distribution. d f = no. of independent variables Reject if test-static > critical value.

Page 4 4.1.3 Correcting for Heteroskedasticity Robust Standard Errors Generalized Least Squares Corrects standard error of estimated coefficients. Also known as heteroskedasticity consistent standards errors or whitecorrected standards errors. Modify original equation. Requires economic expertise to implement correctly on financial data. 4.2 Serial Correlation Regression errors correlated across observations. Usually arises in time-series regression. 4.2.1 The Consequences of Serial Correlation Incorrect estimate of regression coefficient standard errors Parameter estimates become inconsistent & invalid when Y is lagged onto X under serial correlation. Positive serial correlation positive (negative) errors chance of positive (negative) errors Negative serial correlation positive (negative) errors chance of negative (positive) errors It leads to wrong inferences If positive serial correlation: Standard errors underestimated T-statistic & F-statistics inflated Type-I error If negative serial correlation Standard errors overestimated T-statistics & F-statistics understated Type-II error 4.2.2 Testing for Serial Correlation Variety of tests, most common Durbin-Watson test = Where = regression residual for period t. For large sample size Durbin-Watson statistic (d) is approximately DW 2(1-r) where r = sample correlation b/w regression residuals of t and t-1 Values of DW can range from 0 to 4. DW = 2 r=0 no serial correlation. DW = 0 r=1 perfectly positively serially correlated. DW = 4 r = -1 perfectly negatively serially correlated. For positive serial correlation: No positive serial correlation Positive serial correlation < reject > do not reject dl inconclusive.

Page 5 4.2.2 Testing for Serial Correlation For negative serial correlation: No negative serial correlation. Negative serial correlation. > 4 Reject. < 4 do not reject 4 4 inconclusive. 4.2.3 Correcting for Serial Correlation Adjust the coefficient standard errors. Recommended method Hansen s method most prevalent one. Modify regression equation. Extreme care is required. May lead to inconsistent parameters estimates. 4.3 Multicollinearity Occurs when two or more independent variables (X) are highly correlated with each other. Regression can be estimated but result becomes problematic. Serious practical concern due to commonly found approximate linear relation among financial variables. 4.3.1 The Consequences of Multicollinearity Difficulty in detecting significant relationships. Estimates become extremely imprecise & unreliable though consistency is unaffected. F-statistic is unaffected. Standard errors of regression can. Causing insignificant t-tests Wide confidence interval Type II error 4.3.2 Detecting Multicollinearity Multicollinearity is a matter of degree rather than the presence / absence. Pair wise correlation does not necessarily indicate presence of Multicollinearity Pair wise correlation does not necessarily indicate absence of Multicollinearity With 2 independent variables correlation is a useful indicator. R 2 significant, F-statistic significant, insignificant t-statistic on slope coefficients classic symptom of Multicollinearity 4.3.3 Correcting Multicollinearity Exclude one or more regression variables. In many cases, experimentation is done to determine variable causing Multicollinearity

Page 6 5. MODEL SPECIFICATION AND ERRORS IN SPECIFICATION Model specification set of variables included in regression. Incorrect specification leads to biased & inconsistent parameters 5.1 Principles of Model Specification Model grounded on economic reasoning. Functional form of variables compatible with nature of variables Parsimonious each included variable should play an essential role Model is examined for the violation of regression assumptions. Model is tested for the validity & usefulness of the out of sample data. 5.2 Misspecified Functional Form One or more variables are omitted. If omitted variable is correlated with remaining variable, error term will also be correlated with the latter and the: result can be biased & inconsistent. estimated standard errors of the coefficients will be inconsistent. One or more variables may require transformation. Pooling of data from different samples that should not be pooled. Can lead to spurious results. 5.3 Times-Series Misspecification (Independent Variables Correlated with Errors) Including lagged variables (dependent) as independent with serial correlation. Including a function of the dependent variable as an independent variable. Independent variables measured with error 5.4 Other Types of Time-Series Misspecification Nonstationarity: variable properties, e.g. mean, are not constant through time. In practice nonstationarity is a serious problem.

Page 7 6. MODELS WITH QUALITATIVE DEPENDENT VARIABLES Qualitative dependent variables dummy variables used as dependent instead of independent. Probit model based on normal distribution estimates the probability: of discrete outcome, given values of independent variables used to explain that outcome. that Y=1, implying a condition is met. Logit model: Identical to Probit model. Based on logistic distribution. Both Logit and Probit models must be estimated using maximum likelihood methods. Discriminate analysis can be used to create an overall score that is used for classification. Qualitative dependent variable models can be used for portfolio management and business management.