Chapter Specification Error Analsis The specification of a linear regression model consists of a formulation of the regression relationships and of statements or assumptions concerning the explanator variables and disturbances If an of these is violated, eg, incorrect functional form, incorrect introduction of disturbance term in the model etc, then specification error occurs In narrower sense, the specification error refers to explanator variables The complete regression analsis depends on the explanator variables present in the model It is understood in the regression analsis that onl correct and important explanator variables appears in the model In practice, after ensuring the correct functional form of the model, the analst usuall has a pool of explanator variables which possibl influence the process or experiment Generall, all such candidate variables are not used in the regression modeling but a subset of explanator variables is chosen from this pool While choosing a subset of explanator variables, there are two possible options: In order to make the model as realistic as possible, the analst ma include as man as possible explanator variables In order to make the model as simple as possible, one ma include onl fewer number of explanator variables In such selections, there can be two tpes of incorrect model specifications Omission/exclusion of relevant variables Inclusion of irrelevant variables Now we discuss the statistical consequences arising from the both situations Exclusion of relevant variables: In order to keep the model simple, the analst ma delete some of the explanator variables which ma be of importance from the point of view of theoretical considerations There can be several reasons behind such decisions, eg, it ma be hard to quantif the variables like taste, intelligence etc Sometimes it ma be difficult to take correct observations on the variables like income etc Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur
Let there be k candidate explanator variables out of which suppose r variables are included and ( k r) variables are to be deleted from the model So partition the X and β as X = X X β = β β n k and n r n ( k r) r ( k r) ) The model Xβ ε, E( ε) 0, V( ε) σ I = + = = can be expressed as = X β + X β + ε which is called as full model or true model After dropping the r explanator variable in the model, the new model is = Xβ+ δ which is called as misspecified model or false model Appling OLS to the false model, the OLSE of β is b = ( ) X The estimation error is obtained as follows: b = ( ) X( Xβ + Xβ + ε) = + ( ) + ( ) X β = θ + ( ) ε β β ε b X where θ = ( ) β Eb ( β ) = θ + ( ) E( ε) = θ which is a linear function of β, ie, the coefficients of excluded variables So b is biased, in general The bias vanishes if = 0, ie, X and X are orthogonal or uncorrelated The mean squared error matrix of b is MSEb ( ) = Eb ( β )( b β ) = E θθ + θε X( ) + ( ) Xεθ + ( ) Xεε X( ) = θθ + 0+ 0 + σ ( ) XIX ( ) = θθ + σ ( ) Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur
So efficienc generall declines Note that the second term is the conventional form of MSE The residual sum of squares is SSres ee ˆ σ = = n r n r where e= Xb = H, H = I X( ) X H= H( Xβ + Xβ + ε) = 0 + H ( X β + ε) = H ( X β + ε) H = ( X β + X β + ε) H ( X β + ε) = ( β XHHXβ + β XHε + β XHXβ + β XHε + ε HXβ + ε Hε) Es ( ) = E( βxhx β) + 0+ 0 + E( ε Hε) n r = βxhx β) + ( n r) σ n r = σ + βxhx β n r s is a biased estimator of σ and s provides an over estimate of σ Note that even if = 0, then also s gives an overestimate of fault The t -test and confidence region will be invalid in this case σ So the statistical inferences based on this will be If the response is to be predicted at x = ( x, x ), then using the full model, the predicted value is ˆ = = ( ) xb x X X X with E( ˆ ) = x β Var ˆ = σ + x X X x ( ) ( ) When subset model is used then the predictor is ˆ and then = xb Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur 3
E( ˆ ) = x ( X X ) X E( ) = ( ) ( β + β + ε) x XE X X = x( ) X( Xβ+ Xβ) = xβ+ x( ) β = xβ + xθ i ŷ is a biased predictor of It is unbiased when = 0 The MSE of predictor is Also ( ) MSE( ˆ) = σ + x( ) x + xθ xβ ˆ MSE ˆ Var( ) ( ) provided V ( ˆ β) ββ is positive semidefinite Inclusion of irrelevant variables Sometimes due to enthusiasm and to make the model more realistic, the analst ma include some explanator variables that are not ver relevant to the model Such variables ma contribute ver little to the explanator power of the model This ma tend to reduce the degrees of freedom ( n k) and consequentl the validit of inference drawn ma be questionable or example, the value of coefficient of determination will increase indicating that the model is getting better which ma not reall be true Let the true model be = Xβ + ε, E( ε) = 0, V( ε) = σ I which comprise k explanator variable Suppose now r additional explanator variables are added to the model and resulting model becomes = Xβ + γ + δ where is a n r matrix of n observations on each of the r explanator variables and γ is r vector of regression coefficient associated with and δ is disturbance term This model is termed as false model Appling OLS to false model, we get Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur 4
b X X X X = c X X X X b X = X c X Xb + X c = X () Xb + c = () where b and C are the OLSEs of β and γ respectivel Premultipl equation () b X ( ), we get ( ) + ( ) = ( ) (3) X Xb X c X Subtracting equation () from (3), we get X X X ( ) X b = X X ( ) X I Xb X I b = X H X X H where ( ) ( ) = ( ) ( ) H = I The estimation error of b is b β = X H X X H β ( ) = X H X X H Xβ + ε β = ( ) ( ) X H X X H ε ( ) ( β) = ( ) ( ε) = 0 E b X H X X H E so b is unbiased even when some irrelevant variables are added to the model The covariance matrix is ( β)( β) Vb ( ) = Eb b ( ) = E X HX X Hεε HX( X HX) = σ = σ ( X HX) X HIHX ( X HX) ( X H X) Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur 5
If OLS is applied to true model, then with Eb ( T ) bt = ( X X) X Vb = β ( T ) = σ ( X X) To compare b and b, we use the following result T Result: If A and B are two positive definite matrices then A B is atleast positive semi definite if B A is also atleast positive semi definite Let A= ( X H X) B= ( X X) B A = X X X H X = + ( ) X X X X X X = X ( ) X which is atleast positive semi definite matrix This implies that the efficienc declines unless X = 0 If X = 0, ie, X and are orthogonal, then both are equall efficient The residual sum of squares under false model is where SS = e e res e = Xb C = ( ) b X H X X H ( ) ( ) c = Xb ( ) ( Xb ) = ( ) I X( X HX) X H z ( ) ( ) x = ( ) X H X X H = I H I X X H X X H H = = = H : idempotent Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur 6
So = ( ) ( ) X e X X H X X H H = I X( X H X) X H ( ) H X = HX ( I H ) HX = H H X = H where H = H H * * X X X SS = e e res E SS = H H H H = HH = H X X * X * ( res ) ( X ) σ ( n k r) = SSres E = σ n k r X = σ tr H So SSres n k r is an unbiased estimator of σ A comparison of exclusion and inclusion of variables is as follows: Exclusion tpe Inclusion tpe Estimation of coefficients Biased Unbiased Efficienc Generall declines Declines Estimation of disturbance term Over-estimate Unbiased Conventional test of hpothesis Invalid and fault inferences Valid though erroneous and confidence region Econometrics Chapter Specification Error Analsis Shalabh, IIT Kanpur 7