Homoskedasticity. Var (u X) = σ 2. (23)

Similar documents
Multiple Regression Analysis. Part III. Multiple Regression Analysis

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

ECON The Simple Regression Model

Review of Econometrics

Heteroskedasticity. Part VII. Heteroskedasticity

The Multiple Regression Model Estimation

The Simple Regression Model. Part II. The Simple Regression Model

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Intermediate Econometrics

Multivariate Regression Analysis

Simple Linear Regression: The Model

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Introductory Econometrics

Econometrics I Lecture 3: The Simple Linear Regression Model

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON3150/4150 Spring 2015

Multiple Linear Regression

The Simple Regression Model. Simple Regression Model 1

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Least Squares Estimation-Finite-Sample Properties

Ch 3: Multiple Linear Regression

Applied Econometrics (QEM)

Multiple Regression Analysis: Heteroskedasticity

Introductory Econometrics

Multiple Linear Regression CIVL 7012/8012

ECON3150/4150 Spring 2016

Multiple Linear Regression

Applied Econometrics (QEM)

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Applied Statistics and Econometrics

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Motivation for multiple regression

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Econometrics Summary Algebraic and Statistical Preliminaries

Multiple Regression Analysis

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Intermediate Econometrics

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Lecture 3: Multiple Regression

Econometrics - 30C00200

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

Introductory Econometrics

2. Linear regression with multiple regressors

Properties of the least squares estimates

Empirical Economic Research, Part II

Multiple Regression Analysis

Statistical Inference with Regression Analysis

Econometrics Multiple Regression Analysis: Heteroskedasticity

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Heteroskedasticity and Autocorrelation

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

Financial Econometrics

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Introduction to Estimation Methods for Time Series models. Lecture 1

The Simple Linear Regression Model

Ch 2: Simple Linear Regression

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

The Statistical Property of Ordinary Least Squares

Ordinary Least Squares Regression

Estadística II Chapter 4: Simple linear regression

3. Linear Regression With a Single Regressor

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Intro to Applied Econometrics: Basic theory and Stata examples

Quantitative Methods I: Regression diagnostics

THE MULTIVARIATE LINEAR REGRESSION MODEL

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Introduction to Econometrics. Heteroskedasticity

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Multiple Regression Analysis

Linear models and their mathematical foundations: Simple linear regression

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Advanced Quantitative Methods: ordinary least squares

8. Instrumental variables regression

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Simple Linear Regression

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Applied Regression Analysis

Simple Linear Regression

y response variable x 1, x 2,, x k -- a set of explanatory variables

ECON3150/4150 Spring 2016

Diagnostics of Linear Regression

Lecture 4: Heteroskedasticity

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

The regression model with one fixed regressor cont d

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Quantitative Methods II

ECNS 561 Multiple Regression Analysis

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Econometrics Review questions for exam

Empirical Application of Simple Regression (Chapter 2)

Making sense of Econometrics: Basics

Transcription:

Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This means that the variance of the error term u is the same, regardless of the predictor variable X. If assumption (23) is violated, e.g. if Var (u X) = σ 2 h(x), then we say the error term is heteroskedastic. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-55

Homoskedasticity Assumption (23) certainly holds, if u and X are assumed to be independent. However, (23) is a weaker assumption. Assumption (23) implies that σ 2 is also the unconditional variance of u, referred to as error variance: Var (u) = E(u 2 ) (E(u)) 2 = σ 2. Its square root σ is the standard deviation of the error. It follows that Var (Y X) = σ 2. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-56

Variance of the OLS estimator How large is the variation of the OLS estimator around the true parameter? Difference ˆβ 1 β 1 is 0 on average Measure the variation of the OLS estimator around the true parameter through the expected squared difference, i.e. the variance: ( ) Var ˆβ1 = E(( ˆβ 1 β 1 ) 2 ) (24) Similarly for ˆβ 0 : Var ( ˆβ0 ) = E(( ˆβ 0 β 0 ) 2 ). Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-57

Variance of the OLS estimator Variance of the slope estimator ˆβ 1 follows from (22): Var ( ˆβ1 ) = = 1 N 2 (s 2 x) 2 σ 2 N 2 (s 2 x) 2 N (x i x) 2 Var (u i ) i=1 N i=1 (x i x) 2 = σ2 Ns 2. (25) x The variance of the slope estimator is the larger, the smaller the number of observations N (or the smaller, the larger N). Increasing N by a factor of 4 reduces the variance by a factor of 1/4. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-58

Variance of the OLS estimator Dependence on the error variance σ 2 : The variance of the slope estimator is the larger, the larger the error variance σ 2. Dependence on the design, i.e. the predictor variable X: The variance of the slope estimator is the larger, the smaller the variation in X, measured by s 2 x. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-59

Variance of the OLS estimator The variance is in general different ( ) for the two parameters of the simple regression model. Var ˆβ0 is given by (without proof): Var ( ˆβ0 ) = σ2 Ns 2 x N i=1 x 2 i. (26) The standard deviations sd( ˆβ 0 ) and sd( ˆβ 1 ) of the OLS estimators are defined as: sd( ˆβ ( ) 0 ) = Var ˆβ0, sd( ˆβ ( ) 1 ) = Var ˆβ1. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-60

Mile stone II The Multiple Regression Model Step 1: Model Definition Step 2: OLS Estimation Step 3: Econometric Inference Step 4: OLS Residuals Step 5: Testing Hypothesis Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-60

Mile stone II Step 6: Model Evaluation and Model Comparison Step 7: Residual Diagnostics Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-61

Cross-sectional data We are interested in a dependent (left-hand side, explained, response) variable Y, which is supposed to depend on K explanatory (right-hand sided, independent, control, predictor) variables X 1,..., X K Examples: wage is a response and education, gender, and experience are predictor variables we are observing these variables for N subjects drawn randomly from a population (e.g. for various supermarkets, for various individuals): (y i, x 1,i,..., x K,i ), i = 1,..., N Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-62

II.1 Model formulation The multiple regression model describes the relation between the response variable Y and the predictor variables X 1,..., X K as: Y = β 0 + β 1 X 1 +... + β K X K + u, (27) β 0, β 1,..., β K are unknown parameters. Key assumption: E(u X 1,..., X K ) = E(u) = 0. (28) Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-63

Model formulation Assumption (28) implies: E(Y X 1,..., X K ) = β 0 + β 1 X 1 +... + β K X K. (29) E(Y X 1,..., X K ) is a linear function in the parameters β 0, β 1,..., β K estimation), (important for,,easy OLS and in the predictor variables X 1,..., X K correct interpretation of the parameters). (important for the Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-64

Understanding the parameters The parameter β k is the expected absolute change of the response variable Y, if the predictor variable X k is increased by 1, and all other predictor variables remain the same (ceteris paribus): E( Y X k ) = E(Y X k = x + X k ) E(Y X k = x) = β 0 + β 1 X 1 +... + β k (x + X k ) +... + β K X K (β 0 + β 1 X 1 +... + β k x +... + β K X K ) = β k X k. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-65

Understanding the parameters The sign shows the direction of the expected change: If β k > 0, then the change of X k direction. and Y goes into the same If β k < 0, then the change of X k and Y goes into different directions. If β k = 0, then a change in X k has no influence on Y. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-66

The multiple log-linear model The multiple log-linear model reads: Y = e β0 X β 1 1 Xβ K K e u. (30) The log transformation yields a model that is linear in the parameters β 0, β 1,..., β K, log Y = β 0 + β 1 log X 1 +... + β K log X K + u, (31) but is nonlinear in the predictor variables X 1,..., X K. Important for the correct interpretation of the parameters. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-67

The multiple log-linear model The coefficient β k is the elasticity of the response variable Y with respect to the variable X k, i.e. the expected relative change of Y, if the predictor variable X k is increased by 1% and all other predictor variables remain the same (ceteris paribus). If X k is increased by p%, then (ceteris paribus) the expected relative change of Y is equal to β k p%. On average, Y increases by β k p%, if β k > 0, and decreases by β k p%, if β k < 0. If X k is decreased by p%, then (ceteris paribus) the expected relative change of Y is equal to β k p%. On average, Y decreases by β k p%, if β k > 0, and increases by β k p%, if β k < 0. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-68

EVIEWS Exercise II.1.2 Show in EViews, how to define a multiple regression model and discuss the meaning of the estimated parameters: Case Study Chicken, work file chicken; Case Study Marketing, work file marketing; Case Study profit, work file profit; Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-69

II.2 OLS-Estimation Let (y i, x 1,i,..., x K,i ), i = 1,..., N denote a random sample of size N from the population. Hence, for each i: y i = β 0 + β 1 x 1,i +... + β k x k,i +... + β K x K,i + u i. (32) The population parameters β 0, β 1, and β K are estimated from a sample. The parameters estimates (coefficients) are typically denoted by ˆβ 0, ˆβ 1,..., βk ˆ. We will use the following vector notation: β = (β 0,..., β K ), ˆβ = ( ˆβ0, ˆβ 1,..., ˆ βk ). (33) Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-70

II.2 OLS-Estimation The commonly used method to estimate the parameters in a multiple regression model is, again, OLS estimation: For each observation y i, the prediction ŷ i (β) of y i depends on β = (β 0,..., β K ). For each y i, define the regression residuals (prediction error) u i (β) as: u i (β) = y i ŷ i (β) = y i (β 0 + β 1 x 1,i +... + β K x K,i ). (34) Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-71

OLS-Estimation for the Multiple Regression Model For each parameter value β, an overall measure of fit is obtained by aggregating these prediction errors. The sum of squared residuals (SSR): SSR = N u i (β) 2 = N (y i β 0 β 1 x 1,i... β K x K,i ) 2.(35) i=1 i=1 The OLS-estimator ˆβ = ( ˆβ 0, ˆβ 1,..., ˆ βk ) is the parameter that minimizes the sum of squared residuals. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-72

How to compute the OLS Estimator? For a multiple regression model, the estimation problem is solved by software packages like EViews. Some mathematical details: Take the first partial derivative of (35) with respect to each parameter β k, k = 0,..., K. This yields a system K + 1 linear equations in β 0,..., β K, which has a unique solution under certain conditions on the matrix X, having N rows and K + 1 columns, containing in each row i the predictor values (1 x 1,i... x K,i ). Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-73

Matrix notation of the multiple regression model Matrix notation for the observed data: 1 x 1,1. x K,1 1 x 1,2. x K,2 X =...., y = 1 x 1,N 1. x K,N 1 1 x 1,N. x K,N X is N (K + 1)-matrix, y is N 1-vector. y 1 y 2. y N 1 y N. The X X is a quadratic matrix with (K + 1) rows and columns. (X X) 1 is the inverse of X X. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-74

Matrix notation of the multiple regression model In matrix notation, the N equations given in (32) for i = 1,..., N, may be written as: y = Xβ + u, where u = u 1 u 2. u N, β = β 0. β K. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-75

The OLS Estimator The OLS estimator ˆβ has an explicit form, depending on X and the vector y, containing all observed values y 1,..., y N. The OLS estimator is given by: ˆβ = (X X) 1 X y. (36) The matrix X X has to be invertible, in order to obtain a unique estimator β. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-76

The OLS Estimator Necessary conditions for X X being invertible: We have to observe sample variation for each predictor X k ; i.e. the sample variances of x k,1,..., x k,n is positive for all k = 1,..., K. Furthermore, no exact linear relation between any predictors X k and X l should be present; i.e. the empirical correlation coefficient of all pairwise data sets (x k,i, x l,i ), i = 1,..., N is different from 1 and -1. EViews produces an error, if X X is not invertible. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-77

Perfect Multicollinearity A sufficient assumptions about the predictors X 1,..., X K multiple regression model is the following: in a The predictors X 1,..., X K are not linearly dependent, i.e. no predictor X j may be expressed as a linear function of the remaining predictors X 1,..., X j 1, X j+1,..., X K. If this assumption is violated, then the OLS estimator does not exist, as the matrix X X is not invertible. There are infinitely many parameters values β having the same minimal sum of squared residuals, defined in (35). The parameters in the regression model are not identified. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-78

Case Study Yields Demonstration in EVIEWS, workfile yieldus y i = β 1 + β 2 x 2,i + β 3 x 3,i + β 4 x 4,i + u i, y i... yield with maturity 3 months x 2,i... yield with maturity 1 month x 3,i... yield with maturity 60 months x 4,i... spread between these yields x 4,i = x 3,i x 2,i x 4,i is a linear combination of x 2,i and x 3,i Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-79

Case Study Yields Let β = (β 1, β 2, β 3, β 4 ) be a certain parameter Any parameter β = (β 1, β 2, β 3, β 4), where β 4 may be arbitrarily chosen and β 3 = β 3 + β 4 β 4 β 2 = β 2 β 4 + β 4 will lead to the same sum of mean squared errors as β. The OLS estimator is not unique. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-80

II.3 Understanding Econometric Inference Econometric inference: learning from the data about the unknown parameter β in the regression model. Use the OLS estimator ˆβ to learn about the regression parameter. Is this estimator equal to the true value? How large is the difference between the OLS estimator and the true parameter? Is there a better estimator than the OLS estimator? Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-81

Unbiasedness Under the assumptions (28), the OLS estimator (if it exists) is unbiased, i.e. the estimated values are on average equal to the true values: E( ˆβ j ) = β j, j = 0,..., K. In matrix notation: E(ˆβ) = β, E(ˆβ β) = 0. (37) Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-82

Unbiasedness of the OLS estimator If the data are generated by the model y = Xβ + u, then the OLS estimator may be expressed as: ˆβ = (X X) 1 X y = (X X) 1 X (Xβ + u) = β + (X X) 1 X u. Therefore the estimation error may be expressed as: Result (37) follows immediately: ˆβ β = (X X) 1 X u. (38) E(ˆβ β) = (X X) 1 X E(u) = 0. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-83

Covariance Matrix of the OLS Estimator Due to unbiasedness, the expected value E( ˆβ j ) of the OLS estimator is equal to β j for j = 0,..., K. ( ) Hence, the variance Var ˆβj measures the variation of the OLS estimator ˆβ j around the true value β j : ( ) Var ˆβj = E ( ( ˆβ j E( ˆβ j )) 2) ( = E ( ˆβ j β j ) 2). Are the deviation of the estimator from the true value correlated for different coefficients of the OLS estimators? Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-84

Covariance Matrix of the OLS Estimator MATLAB Code: regestall.m Design 1: x i.5 + Uniform [0, 1] (left hand side) versus Design 2: x i 1 + Uniform [0, 1] (N = 50, σ 2 = 0.1) (right hand side) 1 N=50,σ 2 =0.1,Design 1 1 N=50,σ 2 =0.1,Design 1 1.5 1.5 β 2 (price) β 2 (price) 2 2 2.5 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 β 1 (constant) 2.5 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 β 1 (constant) Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-85

Covariance Matrix of the OLS Estimator The covariance Cov( ˆβ j, ˆβ k ) of different coefficients of the OLS estimators measures, if deviations between the estimator and the true value are correlated. Cov( ˆβ j, ˆβ ( k ) = E ( ˆβ j β j )( ˆβ ) k β k ). This information is summarized for all possible pairs of coefficients in the covariance matrix of the OLS estimator. Note that Cov(ˆβ) = E((ˆβ β)(ˆβ β) ). Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-86

Covariance Matrix of the OLS Estimator The covariance matrix of a random vector is a square matrix, containing in the diagonal the variances of the various elements of the random vector and the covariances in the off-diagonal elements. Cov(ˆβ) = ( ) Var ˆβ0 Cov( ˆβ 0, ˆβ 1 ) Cov( ˆβ 0, ˆβ K ) Cov( ˆβ 0, ˆβ ( ) 1 ) Var ˆβ1 Cov( ˆβ 1, ˆβ K )..... Cov( ˆβ 0, ˆβ K ) Cov( ˆβ K 1, ˆβ ( ) K ) Var ˆβK. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-87

Homoskedasticity To derive Cov(ˆβ), we make an additional assumption, namely homoskedasticity: Var (u X 1,..., X K ) = σ 2. (39) This means that the variance of the error term u is the same, regardless of the predictor variables X 1,..., X K. It follows that Var (Y X 1,..., X K ) = σ 2. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-88

Error Covariance Matrix Because the observations are a random sample from the population, any two observations y i and y l are uncorrelated. Hence also the errors u i and u l are uncorrelated. Together with (39) we obtain the following covariance matrix of the error vector u: Cov(u) = σ 2 I, with I being the identity matrix. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-89

Covariance Matrix of the OLS Estimator Under assumption (28) and (39), the covariance matrix of the OLS estimator ˆβ is given by: Cov(ˆβ) = σ 2 (X X) 1. (40) Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-90

Covariance Matrix of the OLS Estimator Proof. Using (38), we obtain: ˆβ β = Au, A = (X X) 1 X. The following holds: E((ˆβ β)(ˆβ β) ) = E(Auu A ) = AE(uu )A = ACov(u)A. Therefore: Cov(ˆβ) = σ 2 AA = σ 2 (X X) 1 X X(X X) 1 = σ 2 (X X) 1 Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-91

Covariance Matrix of the OLS Estimator The( diagonal ) elements of the matrix σ 2 (X X) 1 define the variance Var ˆβj of the OLS estimator for each component. The standard deviation sd( ˆβ j ) of each OLS estimator is defined as: sd( ˆβ j ) = Var ( ˆβj ) = σ (X X) 1 j+1,j+1. (41) It measures the estimation error on the same unit as β j. Evidently, the standard deviation is the larger, the larger the variance of the error. What other factors influence the standard deviation? Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-92

Multicollinearity In practical regression analysis very often high (but not perfect) multicollinearity is present. How well may X j be explained by the other regressors? Consider X j as left-hand variable in the following regression model, whereas all the remaining predictors remain on the right hand side: X j = β 0 + β 1 X 1 +... + β j 1 X j 1 + β j+1 X j+1 +... + β K X K + ũ. Use OLS estimation to estimate the parameters and let ˆx j,i be the values predicted from this (OLS) regression. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-93

Multicollinearity Define R j as the correlation between the observed values x j,i and the predicted values ˆx j,i in this regression. If R 2 j is close to 0, then X j cannot be predicted from the other regressors. X j contains additional, independent information. The closer R 2 j is to 1, the better X j is predicted from the other regressors and multicollinearity is present. X j does not contain much,,independent information. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-94

The variance of the OLS estimator ( ) Using R j, the variance Var ˆβj of the OLS estimators of the coefficient β j corresponding to X j may be expressed in the following way for j = 1,..., K: ( ) Var ˆβj ( ) Hence, the variance Var ˆβj = σ 2 Ns 2 x j (1 R 2 j ). of the estimate ˆβ j is large, if the regressors X j is highly redundant, given the other regressors (R 2 j close to 1, multicollinearity). Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-95

The variance of the OLS estimator All other factors ( ) same as for the simple regression model, i.e. the variance Var ˆβj of the estimate ˆβ j is large, if the variance σ 2 of the error term u is large; the sampling variation in the regressor X j, i.e. the variance s 2 x j, is small; the sample size N is small. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-96

II.4 OLS Residuals Consider the estimated regression model under OLS estimation: y i = ˆβ 0 + ˆβ 1 x 1,i +... + ˆβ K x K,i + û i = ŷ i + û i, where ŷ i = ˆβ 0 + ˆβ 1 x 1,i +... + ˆβ K x K,i is called the fitted value. û i is called the OLS residual. OLS residuals are useful: to estimate the variance σ 2 of the error term; to quantify the quality of the fitted regression model; for residual diagnostics Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-97

EVIEWS Exercise II.4.1 Discuss in EVIEWS how to obtain the OLS residuals and the fitted regression: Case Study profit, workfile profit; Case Study Chicken, workfile chicken; Case Study Marketing, workfile marketing; Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-98

OLS residuals as proxies for the error Compare the underlying regression model Y = β 0 + β 1 X 1 +... + β K X K + u, (42) with the estimated model for i = 1,..., N: y i = ˆβ 0 + ˆβ 1 x 1,i +... + ˆβ K x K,i + û i. The OLS residuals û 1,..., û N may be considered as a sample of the unobservable error u. Use the OLS residuals û 1,..., û N to estimate σ 2 = Var (u). Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-99

Algebraic properties of the OLS estimator The OLS residuals û 1,..., û N obey K +1 linear equations and have the following algebraic properties: The sum (average) of the OLS residuals û i is equal to zero: 1 N N i=1 û i = 0. (43) The sample covariance between x k,i and û i is zero: 1 N N i=1 x k,i û i = 0, k = 1,..., K. (44) Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-100

Estimating σ 2 A naive estimator of σ 2 would be the sample variance of the OLS residuals û 1,..., û N : ˆσ 2 = 1 N ( N û 2 i 1 N i=1 ) 2 N û i = 1 N i=1 N i=1 û 2 i = SSR N, where we used (43) and SSR = N i=1 û2 i residuals. is the sum of squared OLS However, due to the linear dependence between the OLS residuals, û 1,..., û N is not an independent sample. Hence, ˆσ 2 is a biased estimator of σ 2. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-101

Estimating σ 2 Due to the linear dependence between the OLS residuals, only df = (N K 1) residuals can be chosen independently. df is also called the degrees of freedom. An unbiased estimator of the error variance σ 2 in a homoscedastic multiple regression model is given by: ˆσ 2 = SSR df, (45). where df = (N K 1), N is the number of observations, and K is the number of predictors X 1,..., X K Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-102

The standard errors of the OLS estimator The standard deviation sd( ˆβ j ) of the OLS estimator given in (46) depends on σ = σ 2. To evaluate the estimation error for a given data set in practical regression analysis, σ 2 is substituted by the estimator (45). This yields the so-called standard error se( ˆβ j ) of the OLS estimator: se( ˆβ j ) = ˆσ 2 (X X) 1 j+1,j+1. (46) Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-103

EVIEWS Exercise II.4.1 EViews (and other packages) report for each predictor the OLS estimator together with the standard errors: Case Study profit, work file profit; Case Study Chicken, work file chicken; Case Study Marketing, work file marketing; Note: the standard errors computed by EViews (and other packages) are valid only under the assumption made above, in particular, homoscedasticity. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-104

Quantifying the model fit How well does the multiple regression model (42) explain the variation in Y? Compare it with the following simple model without any predictors: Y = β 0 + ũ. (47) The OLS estimator of β 0 minimizes the following sum of squared residuals: and is given by ˆβ 0 = y. N (y i β 0 ) 2 i=1 Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-105

Coefficient of Determination The minimal sum is equal to the total variation SST = N (y i y) 2. i=1 Is it possible to reduce the sum of squared residuals SST of the simple model (47) by including the predictor variables X 1,..., X N as in (42)? Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-106

Coefficient of Determination The minimal sum of squared residuals SSR of the multiple regression model (42) is always smaller than the minimal sum of squared residuals SST of the simple model (47): SSR SST. (48) The coefficient of determination R 2 of the multiple regression model (42) is defined as: R 2 = SST SSR SST = 1 SSR SST. (49) Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-106

Coefficient of Determination Proof of (48). The following variance decomposition holds: SST = N (y i ŷ i + ŷ i y) 2 = N û 2 i + 2 N û i (ŷ i y) + N (ŷ i y) 2. i=1 i=1 i=1 i=1 Using the algebraic properties (43) and (44) of the OLS residuals, we obtain: N û i (ŷ i y) = ˆβ 0 N û i + ˆβ 1 N û i x 1,i +... + ˆβ K N û i x K,i y N û i = 0. i=1 i=1 i=1 i=1 i=1 Therefore: SST = SSR + N (ŷ i y) 2 SSR. i=1 Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-107

Coefficient of Determination The coefficient of determination R 2 is a measure of goodness-of-fit: If SSR SST, then there is little gained by including the predictors. R 2 is close to 0. The multiple regression model explains the variation in Y hardly better than the simple model (47). If SSR << SST, then much is gained by including all predictors. R 2 is close to 1. The multiple regression model explains the variation in Y much better than the simple model (47). Programm packages like EViews report SSR and R 2. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-108

Coefficient of Determination 2.5 2 SSR=9.5299, SST=120.0481, R 2 =0.92062 data price as predictor no predictor 1 0.8 SSR=8.3649, SST=8.6639, R 2 =0.034512 data price as predictor no predictor 1.5 0.6 1 0.4 0.5 0.2 0 0 0.5 0.2 1 0.4 1.5 0.6 2 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.8 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 MATLAB Code: reg-est-r2.m Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-109

The Gauss Markov Theorem The Gauss Markov Theorem. Under the assumptions (28) and (39), the OLS estimator is BLUE, i.e. the Best Linear Unbiased Estimator Here best means that any other linear unbiased estimator has larger standard errors than the OLS estimator. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-110

II.5 Testing Hypothesis Multiple regression model: Y = β 0 + β 1 X 1 +... + β j X j +... + β K X K + u, (50) Does the predictor variable X j exercise an influence on the expected mean E(Y ) of the response variable Y, if we control for all other variables X 1,..., X j 1, X j+1,..., X K? Formally, β j = 0? Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-111

Understanding the testing problem Simulate data from a multiple regression model with β 0 = 0.2, β 1 = 1.8, and β 2 = 0: Y = 0.2 1.8X 1 + 0 X 2 + u, u Normal ( 0, σ 2). Run OLS estimation for a model where β 2 is unknown: Y = β 0 + β 1 X 2 + β 2 X 3 + u, u Normal ( 0, σ 2), to obtain ( ˆβ 0, ˆβ 1, ˆβ 2 ). Is ˆβ 2 different from 0? MATLAB Code: regtest.m Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-112

Understanding the testing problem 0.5 N=50,σ 2 =0.1,Design 3 0.4 0.3 0.2 β 3 (redundant variable) 0.1 0 0.1 0.2 0.3 0.4 0.5 2.5 2 1.5 1 β (price) 2 The OLS estimator ˆβ 2 of β 2 = 0 differs from 0 for a single data set, but is 0 on average. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-113

Understanding the testing problem OLS estimation for the true model in comparison to estimating a model with a redundant predictor variable: including the redundant predictor X 2 increases the estimation error for the other parameters β 0 and β 1. 1 N=50,σ 2 =0.1,Design 1 1 N=50,σ 2 =0.1,Design 3 1.5 1.5 β 2 (price) β 2 (price) 2 2 2.5 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 β 1 (constant) 2.5 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 β 1 (constant) Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-114

Testing of hypotheses What may we learn from the data about hypothesis concerning the unknown parameters in the regression model, especially about the hypothesis that β j = 0? May we reject the hypothesis β j = 0 given data? Testing, if β j = 0 is not only of importance for the substantive scientist, but also from an econometric point of view, to increase efficiency of estimation of non-zero parameters. It is possible to answer these questions, if we make additional assumptions about the error term u in a multiple regression model. Sylvia Frühwirth-Schnatter Econometrics I WS 2012/13 1-115