Steps in Regression Analysis

Similar documents
Regression Models - Introduction

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

CHAPTER 6: SPECIFICATION VARIABLES

1 Motivation for Instrumental Variable (IV) Regression

Applied Quantitative Methods II

ECNS 561 Multiple Regression Analysis

Regression Models - Introduction

Multicollinearity. Filippo Ferroni 1. Course in Econometrics and Data Analysis Ieseg September 22, Banque de France.

Lectures 5 & 6: Hypothesis Testing

Multiple Regression Analysis: Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Multiple Regression Analysis

Applied Econometrics (QEM)

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Multiple Linear Regression

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

LECTURE 11. Introduction to Econometrics. Autocorrelation

Homoskedasticity. Var (u X) = σ 2. (23)

ECON 497: Lecture 4 Page 1 of 1

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Applied Statistics and Econometrics

STA121: Applied Regression Analysis

Properties of the least squares estimates

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Model Mis-specification

Multiple Regression Analysis

2 Prediction and Analysis of Variance

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

WISE International Masters

Introduction to Econometrics. Heteroskedasticity

Inferences for Regression

Lab 07 Introduction to Econometrics

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Econ107 Applied Econometrics

Economics 308: Econometrics Professor Moody

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Regression Analysis Tutorial 34 LECTURE / DISCUSSION. Statistical Properties of OLS

1. The Multivariate Classical Linear Regression Model

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Statistical Inference with Regression Analysis

Heteroscedasticity and Autocorrelation

ECON 4160, Autumn term Lecture 1

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Econometrics. 7) Endogeneity

Chapter 14 Simple Linear Regression (A)

ECON The Simple Regression Model

Rockefeller College University at Albany

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Linear Regression with Multiple Regressors

THE MULTIVARIATE LINEAR REGRESSION MODEL

Measuring the fit of the model - SSR

Chapter 1. An Overview of Regression Analysis. Econometrics and Quantitative Analysis. What is Econometrics? (cont.) What is Econometrics?

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Iris Wang.

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Lecture 8. Using the CLR Model

Lecture Topic 4: Chapter 7 Sampling and Sampling Distributions

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Review of Econometrics

New York University Department of Economics. Applied Statistics and Econometrics G Spring 2013

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Econometrics Part Three

Multiple Linear Regression CIVL 7012/8012

Multivariate Regression

Modeling Spatial Relationships Using Regression Analysis

Business Statistics. Lecture 10: Course Review

Lecture 4: Multivariate Regression, Part 2

Topic 7: HETEROSKEDASTICITY

Lecture 4: Multivariate Regression, Part 2

Two-Variable Regression Model: The Problem of Estimation

Introductory Econometrics

Modeling Spatial Relationships using Regression Analysis

Chapter 27 Summary Inferences for Regression

Föreläsning /31

Chapter 13. Multiple Regression and Model Building

Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Simple Linear Regression: The Model

Marcel Dettling. Applied Statistical Regression AS 2012 Week 05. ETH Zürich, October 22, Institute for Data Analysis and Process Design

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Heteroscedasticity 1

Introductory Econometrics

Sociology 593 Exam 1 Answer Key February 17, 1995

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Applied Econometrics (QEM)

Economic modelling and forecasting

Econometrics I Lecture 3: The Simple Linear Regression Model

Econometrics Summary Algebraic and Statistical Preliminaries

Transcription:

MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4) 2-1 Steps in Regression Analysis 1. Review the literature and develop the theoretical model 2. Specify the model: choose X s and the functional form 3. Hypothesize the expected signs of the coefficients 4. Gather the data 5. Run the regression and evaluate the regression equation 6. Document the results 2-2 1

Step 1: Review the literature & develop the theoretical model Do not try desperately to find the regression model that best fits your data. Rather, you should be able to explain your regression model with some theory. Therefore, you need to know the relevant theory by reviewing the literature. If there is no theory directly related to your study, try similar or closely related theory. If there still is no such closely related theory, you might consult experts in that field or develop your own theory with your story. 2-3 Steps 2-3: Specify the model and hypothesize the signs of the coefficients The topic you are studying will give you some clues about your dependent variable, Y, to use. You must decide what X s to use. You must decide the functional form: how X s relate to Y. The relationships between X s and Y must be based on some theory, not the data. The theory that backs up your model should give you some hints about the expected signs of the coefficients in front of the independent variables. 2-4 2

Steps 4-5: Collect the data, estimate the coefficients, and evaluate The model should be specified first, before you go out and start collecting data. Verify the reliability of your data. Look out for errors. Generally, the more data, the better. [More degrees of freedom] Unit of measurement does not matter to the regression analysis, except for the interpretation of the coefficients. Use the data you collected, run the OLS regression to obtain the estimates of the coefficients. You might try other estimators and compare their estimates of the coefficients with those from the OLS. Then, you must evaluate the estimates of the coefficients. See if the signs are what you expect and significant. See the overall goodness of fit. 2-5 Step 6: Document the results When documenting the results, the standard format typically is Yˆ i = 100 + 15X i (0.023) n = 100, Adj-R 2 = 0.65 At the minimum, you should include this information. Some researchers include more information than that given above. You should describe your data and procedures, and explain your results in words. 2-6 3

Things to Remember Do not try to fit a model to your data without some reasonable theory behind your model. You do not prove the relationships or theory. But you try to find some evidence that there might be some relationships. Model specification is very important. Only relevant independent variables should be included in the model. 2-7 The Classical Assumptions 1. The regression model is linear in coefficients, is correctly specified, and has an additive error term. 2. Expected value of the error term is zero. [E(ε i )=0] 3. All X i s are uncorrelated with ε i. 4. Error terms are uncorrelated with each other. [No serial correlation] 5. The error term has a constant variance. [No heteroskedasticity] 6. No independent variable can be written as a linear function of other independent variables. [No perfect multicollinearity] 7. The error term is normally distributed. [Optional, but it is nice to have this assumption.] 2-8 4

Assumption 1: Linear in coefficients, correctly specified, and an additive error term The model must include all relevant independent variables, must have a correct functional form, and has an additive error term. The model must be linear in coefficients, but does not have to be linear in variables. 2-9 Assumptions 2-3: E(ε i ) = 0, All X i s are uncorrelated with ε i The mean of the error term is zero. The distribution of ε i does not have to be a normal distribution, as long as the mean of the distribution is zero. OLS regression (with an intercept term) will guarantee that the mean of the error term is zero. [By adjusting the intercept if the actual mean is not zero. Therefore, we will pay little attention to the intercept term.] Assumption 3 states that Cov(X, ε) = 0. 2-10 5

Assumptions 4-5: The error term is uncorrelated with each other, and has a constant variance. Error terms are uncorrelated with each other. There should not be any information contained in one error term to predict the other error terms. Very important, especially, in the timeseries type of analysis. If serial correlation exists, it possibly means that you forgot to include some relevant variables in the model. To get precise estimates of the coefficients, we also need the error term to have a constant variance. 2-11 Assumption 6: No X can be written as a linear function of other X s. No perfect multicollinearity such as X 1 = 2X 2 X 1 = X 2 + X 3 X 1 = 5 (a constant) If there is perfect multicollinearity, you will not be able to obtain the estimates of the coefficients. Solution: drop one variable from the model. The existence of perfect multicollinearity is not a problem, but the existence of imperfect multicollinearity is a problem. 2-12 6

Assumption 7: The error term is normally distributed Not required for estimation. But important for hypothesis testing. The error term can be thought of as the sum of a number of minor influences or errors. As the number of these minor influences or errors gets large, the distribution of the error term approaches normal distribution [Central Limit Theorem]. Intentionally omitting some variables to achieve normality in the error term should never be considered. 2-13 Sampling Distribution of Estimators Just as the error term has a distribution, the estimates of the coefficients, βˆk, have their own distributions too. If the error term is normally distributed, the estimates of the coefficients will also be normally distributed because any linear function of a normally distributed variable is itself normally distributed. 2-14 7

Desirable Properties of the Estimators We prefer to have an unbiased estimator, the estimator that produces, βˆ k, where E( ˆ β k ) = βk. Among all unbiased estimators to choose from, we also prefer an efficient unbiased estimator, the one that produces the smallest variance of the estimates of the coefficients, VAR( βˆk ). 2-15 Gauss-Markov Theorem & OLS If Assumptions 1-6 are met, OLS estimator is the minimum variance estimator of all linear unbiased estimators of β k, k = 0, 1, 2,, K. OLS is BLUE. [Best Linear Unbiased Estimator] Best means that each estimate of β k has the smallest variance possible. 2-16 8

If Assumptions 1-7 are met, OLS estimator is the minimum variance estimator of all unbiased estimators of β k, k = 0, 1, 2,, K. OLS is BUE. [Best Unbiased Estimator] OLS estimator is equivalent to the Maximum Likelihood (ML) estimator. That is, coefficients estimated from OLS are identical to those estimated from ML. However, ML gives biased estimate of the variance of the error term, while OLS does not. 2-17 When All Seven Assumptions Are Met OLS estimator 1. is unbiased, E( ˆ β k ) = βk 2. has the minimum variance among all estimators, 3. is consistent, and 4. is normally distributed, ˆ β ~ VAR( βˆk ) k N 2 ( β, ˆ ) σ β 2-18 9