Chapter 11 Specification Error Analysis

Similar documents
Variable Selection and Model Building

Chapter 13 Variable Selection and Model Building

LINEAR REGRESSION ANALYSIS

Simple Linear Regression Analysis

ECONOMETRIC THEORY. MODULE VI Lecture 19 Regression Analysis Under Linear Restrictions

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Multiple Linear Regression Analysis

ECONOMETRIC THEORY. MODULE XVII Lecture - 43 Simultaneous Equations Models

Variable Selection and Model Building

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Chapter 5 Matrix Approach to Simple Linear Regression

Econ 582 Fixed Effects Estimation of Panel Data

The Linear Regression Model

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Analysis of Variance and Design of Experiments-II

Topic 4: Model Specifications

Multiple Linear Regression CIVL 7012/8012

Specification errors in linear regression models

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Econometrics Summary Algebraic and Statistical Preliminaries

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Applied Econometrics (QEM)

Introduction to Econometrics. Heteroskedasticity

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Brief Suggested Solutions

ECO220Y Simple Regression: Testing the Slope

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing

Inference about the Slope and Intercept

Topic 10: Panel Data Analysis

Variable Selection and Model Building

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N

Economics 620, Lecture 4: The K-Varable Linear Model I

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

LECTURE 5 HYPOTHESIS TESTING

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Matematické Metody v Ekonometrii 7.

AN EVALUATION OF PARAMETRIC AND NONPARAMETRIC VARIANCE ESTIMATORS IN COMPLETELY RANDOMIZED EXPERIMENTS. Stanley A. Lubanski. and. Peter M.

Econ 510 B. Brown Spring 2014 Final Exam Answers

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

STAT 540: Data Analysis and Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances

FinQuiz Notes

ECON The Simple Regression Model

Lecture 9 SLR in Matrix Form

Bayesian Estimation of Regression Coefficients Under Extended Balanced Loss Function

Introduction to Estimation Methods for Time Series models. Lecture 1

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Multiple Linear Regression

Lectures 5 & 6: Hypothesis Testing

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Topic 3: Inference and Prediction

Graduate Econometrics I: Unbiased Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation

ANALYSIS OF VARIANCE AND QUADRATIC FORMS

Specification Error: Omitted and Extraneous Variables

Motivation for multiple regression

Regression Models - Introduction

Topic 3: Inference and Prediction

Homoskedasticity. Var (u X) = σ 2. (23)

Review of Econometrics

Applied Quantitative Methods II

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Regression Analysis for Data Containing Outliers and High Leverage Points

Ch 2: Simple Linear Regression

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

Chapter 14 Stein-Rule Estimation

Simple Linear Regression

CHAPTER 6: SPECIFICATION VARIABLES

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Lecture 4: Heteroskedasticity

The Multiple Regression Model Estimation

Introductory Econometrics

Simple Linear Regression: The Model

ISyE 691 Data mining and analytics

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Greene, Econometric Analysis (7th ed, 2012)

LECTURE 11. Introduction to Econometrics. Autocorrelation

Mixed-Models. version 30 October 2011

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

3 Multiple Linear Regression

Model Mis-specification

Interpreting Regression Results

Short T Panels - Review

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Mathematical Notation Math Introduction to Applied Statistics

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Multiple Regression Analysis

Transcription:

Chapter Specification Error Analsis The specification of a linear regression model consists of a formulation of the regression relationships and of statements or assumptions concerning the explanator variables and disturbances If an of these is violated, eg, incorrect functional form, incorrect introduction of disturbance term in the model etc, then specification error occurs In narrower sense, the specification error refers to explanator variables The complete regression analsis depends on the explanator variables present in the model It is understood in the regression analsis that onl correct and important explanator variables appears in the model In practice, after ensuring the correct functional form of the model, the analst usuall has a pool of explanator variables which possibl influence the process or experiment Generall, all such candidate variables are not used in the regression modeling but a subset of explanator variables is chosen from this pool While choosing a subset of explanator variables, there are two possible options: In order to make the model as realistic as possible, the analst ma include as man as possible explanator variables In order to make the model as simple as possible, one ma include onl fewer number of explanator variables In such selections, there can be two tpes of incorrect model specifications Omission/exclusion of relevant variables Inclusion of irrelevant variables Now we discuss the statistical consequences arising from the both situations Exclusion of relevant variables: In order to keep the model simple, the analst ma delete some of the explanator variables which ma be of importance from the point of view of theoretical considerations There can be several reasons behind such decisions, eg, it ma be hard to quantif the variables like taste, intelligence etc Sometimes it ma be difficult to take correct observations on the variables like income etc Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur

Let there be k candidate explanator variables out of which suppose r variables are included and ( k r) variables are to be deleted from the model So partition the X and β as X = X X β = β β n k and n r n ( k r) r ( k r) ) The model Xβ ε, E( ε) 0, V( ε) σ I = + = = can be expressed as = X β + X β + ε which is called as full model or true model After dropping the r explanator variable in the model, the new model is = Xβ+ δ which is called as misspecified model or false model Appling OLS to the false model, the OLSE of β is b = ( ) X The estimation error is obtained as follows: b = ( ) X( Xβ + Xβ + ε) = + ( ) + ( ) X β = θ + ( ) ε β β ε b X where θ = ( ) β Eb ( β ) = θ + ( ) E( ε) = θ which is a linear function of β, ie, the coefficients of excluded variables So b is biased, in general The bias vanishes if = 0, ie, X and X are orthogonal or uncorrelated The mean squared error matrix of b is MSEb ( ) = Eb ( β )( b β ) = E θθ + θε X( ) + ( ) Xεθ + ( ) Xεε X( ) = θθ + 0+ 0 + σ ( ) XIX ( ) = θθ + σ ( ) Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur

So efficienc generall declines Note that the second term is the conventional form of MSE The residual sum of squares is SSres ee ˆ σ = = n r n r where e= Xb = H, H = I X( ) X H= H( Xβ + Xβ + ε) = 0 + H ( X β + ε) = H ( X β + ε) H = ( X β + X β + ε) H ( X β + ε) = ( β XHHXβ + β XHε + β XHXβ + β XHε + ε HXβ + ε Hε) Es ( ) = E( βxhx β) + 0+ 0 + E( ε Hε) n r = βxhx β) + ( n r) σ n r = σ + βxhx β n r s is a biased estimator of σ and s provides an over estimate of σ Note that even if = 0, then also s gives an overestimate of fault The t -test and confidence region will be invalid in this case σ So the statistical inferences based on this will be If the response is to be predicted at x = ( x, x ), then using the full model, the predicted value is ˆ = = ( ) xb x X X X with E( ˆ ) = x β Var ˆ = σ + x X X x ( ) ( ) When subset model is used then the predictor is ˆ and then = xb Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur 3

E( ˆ ) = x ( X X ) X E( ) = ( ) ( β + β + ε) x XE X X = x( ) X( Xβ+ Xβ) = xβ+ x( ) β = xβ + xθ i ŷ is a biased predictor of It is unbiased when = 0 The MSE of predictor is Also ( ) MSE( ˆ) = σ + x( ) x + xθ xβ ˆ MSE ˆ Var( ) ( ) provided V ( ˆ β) ββ is positive semidefinite Inclusion of irrelevant variables Sometimes due to enthusiasm and to make the model more realistic, the analst ma include some explanator variables that are not ver relevant to the model Such variables ma contribute ver little to the explanator power of the model This ma tend to reduce the degrees of freedom ( n k) and consequentl the validit of inference drawn ma be questionable or example, the value of coefficient of determination will increase indicating that the model is getting better which ma not reall be true Let the true model be = Xβ + ε, E( ε) = 0, V( ε) = σ I which comprise k explanator variable Suppose now r additional explanator variables are added to the model and resulting model becomes = Xβ + γ + δ where is a n r matrix of n observations on each of the r explanator variables and γ is r vector of regression coefficient associated with and δ is disturbance term This model is termed as false model Appling OLS to false model, we get Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur 4

b X X X X = c X X X X b X = X c X Xb + X c = X () Xb + c = () where b and C are the OLSEs of β and γ respectivel Premultipl equation () b X ( ), we get ( ) + ( ) = ( ) (3) X Xb X c X Subtracting equation () from (3), we get X X X ( ) X b = X X ( ) X I Xb X I b = X H X X H where ( ) ( ) = ( ) ( ) H = I The estimation error of b is b β = X H X X H β ( ) = X H X X H Xβ + ε β = ( ) ( ) X H X X H ε ( ) ( β) = ( ) ( ε) = 0 E b X H X X H E so b is unbiased even when some irrelevant variables are added to the model The covariance matrix is ( β)( β) Vb ( ) = Eb b ( ) = E X HX X Hεε HX( X HX) = σ = σ ( X HX) X HIHX ( X HX) ( X H X) Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur 5

If OLS is applied to true model, then with Eb ( T ) bt = ( X X) X Vb = β ( T ) = σ ( X X) To compare b and b, we use the following result T Result: If A and B are two positive definite matrices then A B is atleast positive semi definite if B A is also atleast positive semi definite Let A= ( X H X) B= ( X X) B A = X X X H X = + ( ) X X X X X X = X ( ) X which is atleast positive semi definite matrix This implies that the efficienc declines unless X = 0 If X = 0, ie, X and are orthogonal, then both are equall efficient The residual sum of squares under false model is where SS = e e res e = Xb C = ( ) b X H X X H ( ) ( ) c = Xb ( ) ( Xb ) = ( ) I X( X HX) X H z ( ) ( ) x = ( ) X H X X H = I H I X X H X X H H = = = H : idempotent Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur 6

So = ( ) ( ) X e X X H X X H H = I X( X H X) X H ( ) H X = HX ( I H ) HX = H H X = H where H = H H * * X X X SS = e e res E SS = H H H H = HH = H X X * X * ( res ) ( X ) σ ( n k r) = SSres E = σ n k r X = σ tr H So SSres n k r is an unbiased estimator of σ A comparison of exclusion and inclusion of variables is as follows: Exclusion tpe Inclusion tpe Estimation of coefficients Biased Unbiased Efficienc Generall declines Declines Estimation of disturbance term Over-estimate Unbiased Conventional test of hpothesis Invalid and fault inferences Valid though erroneous and confidence region Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur 7