The Efficiency f Taking First Differences in Regressin Analysis: A Nte J. A. TILLMAN IN a recent article, Geary [1972] discussed the merit f taking first differences t deal with the prblems that trends in data present in regressin analysis. Geary gave examples f situatins where this prcedure leads t highly inefficient estimates f the regressin cefficients. The first difference transfrmatin has als been suggested as an apprpriate way f dealing with multicllinearity amng the independent variables, fr example Kane [1968, p, 28]. This nte generalises Geary's results and shws that transfrming the data by taking first differences fr such a purpse cannt imprve the efficiency f the regressin estimates but will in general cause a reductin in efficiency. Althugh the usual least squares frmula is nt apprpriate fr calculating the variance f regressin estimates btained frm the transfrmed data, they are ften used fr this purpse. Their relatin t the variance f the regressin estimates btained frm the untransfrmed data is examined. Fr cmpleteness this nte cncludes with a brief discussin f first differencing t deal with serially crrelated disturbances in the regressin mdel. Trend and Multicllinearity Let the regressin mdel be y=xb+u (1) where y is a nx 1 vectr, X a nx k matrix f rank k, and/j a vectr f k parameters t be estimated, u is an unbserved nx 1 vectr f disturbances such that E(u)=, E(u «') = a 2 J. Fr such a regressin mdel the best linear unbiased estimatr is the familiar least squares estimatr
with cvariance Cv (h) = - 2 (X'X)~ 1 ' =a*e h Hwever, trends in the data may give rise t implausibly large r values. Anther prblem frequently met with is that f a high degree f multicllinearity between the independent variables. This may result in the estimated cefficients having insignificant t ratis. Taking first differences is smetimes emplyed as a way ut f these difficulties. We will shw that this cannt lead t mre efficient estimates if the underlying regressin is given by (i). Premultiplying (i) by the n ixn matrix T transfrms the regressin t first differences, thus Ty=TXp+Tu (2) where T = 1 1 1 If an intercept term is included in the regressin, first differencing reduces this t zer. T avid this, bth the dependent and independent variables are expressed as'deviatin's abut their J means. This eliminates the need fr an explicit intercept term in (1), Let y*= Ty, X*= TX and u*= Tu. Hence y*, X* and u* haverc 1 rws and («*«*') = a 2 TT'. Therefre (2) may be re-written as y* = X*fi+u* (3) where j8 is nw t be estimated frm (3). But, t qute Geary, "Invariably, hwever, when the regressin prblem is deltaised the assumptin is made that the errr term 'w*' is regular, which assumptin amunts t a wrng specificatin if the basic mdel is (1)." The least squares estimatr using the deltaised data 7* X* is with cvariance Cv(6*) = <r 2 (X*'X*)- X X*' TV X*(X*'X*)~ 1 = a 2 27 6. whereas the best-linear unbiased estimatr is nw - ^ (x*'{riy i x*)- i x*, (TT')- :l y*
with cvariance Cv(j W )=a a (X*'(rr)- 1 X*)- 1 = cr 2^;. The questin t be reslved is that f the relative efficiency amng the three estimatrs b, h* and j8*. Successively applying the Generalised Gauss Markv therem yields the inequality 1. C v(6) < C v()8*) < Cy (b*) Thus b* is the least efficient f the three.estimates. This lss f efficiency is due bth t incrrectly assuming that E(u*u*') = a 2 I and t "lsing" ne bservatin in taking first differences. If T were a nx n matrix f full rank p* wuld be identical t b, but b* wuld still, in general differ frm b. If, after taking first differences t remve trend r multicllinearity, the w* are regarded as being regular, then nt nly will b* be used instead f /?* but the cvariance f b* is estimated (incrrectly) by Cv(6*) =a 2 (X*'X*)" 1 First differencing wuld be judged as having successfully dealt with multicllinearity if Cv(fr*) is smaller than Cv(t). It is therefre f sme interest t investigate the extent t which such a reductin is pssible. We shall btain bunds fr the rati f the generalised cvariances, Cv(t*) / Cv(b) \. The matrix f regressin vectrs X may be written. : X P K where P'P = I and K is nn-singular. Hence '. I X'X \ - I K'K\ : :,- =\k\ and ' r 2 - : /'... : ;. X*[X* \ = \ X'T'TX\ ' = K'P'T'T'TPK\ = I P'T'TP\ I K\ 2 - = I X'X\ A Ck(P'T'TP) 1. This is discussed further in Appendix 1. *- - -,
where we have made use f the fact that the determinant f a symmetric matrix is equal t the prduct f its characteristic rts. The ith largest characteristic rt f a symmetric matrix A may be dented by Ch t (A). Althugh the characteristic rts f P' T'T P depend n X, Cauchy's inequality enables them t be bunded by the characteristic rts f T' T. Cauchy's Inequality If A is an nx n symmetric matrix and R an nx k matrix such that R'R = I, Ch i + _ k (A)^Ch,(R'A R)^Ch i (A) i= i... fe The next step is t evaluate the characteristic rts f T" T. But T'T=- O L I i -I 2 I O O I 2 1. 1 I It is knwn, fr example see Andersn [1971], that Therefre ( r Ck(TT) = 21 1-Csil 1 Cs \ (n i) 4>Ch 1 (VT)>... >CL{T'T) = The characteristic vectr f T crrespnding t Ch [T' T) is a vectr f identical elements rthgnal t the clumns f X. Therefre Cauchy's inequality cntracts t give cfci +._ fe _ I (r't)<c/t«(p'rrp)<cfe«(r, T) 1=1 * (4) The upper bund is attained if the clumns f P crrespnd t the characteristic vectrs f T" T giving the k largest rts. These vectrs may be written as P 2 - The lwer bund is attained if the clumns f P crrespnd t vectrs that yield the k smallest rts excluding Ch (T'T). These may be written as P v After sme simplificatin we btain bunds fr the rati f the generalised cvariances, Cv(&*) nch^t'tt). Cv(fc) nch^^^^vt) n ' J I
The upper bund is attained if X=P 1 K, the lwer if X=P 2 K. The bunds are tabulated fr varius values f k and n in Table i. The lwer bund appraches -- as n increases but the upper bund has n limit as n increases. 4fe The cnditins fr the rati f the generalised cvariances t attain its maximum value has an interesting ecnmic interpretatin. In many ecnmic applicatins, particularly thse using time series data the regressin vectrs are slwly changing, that is the change frm ne perid t the next is small in relatin t the ttal change ver n perids. In such a case the matrix f regressin vectrs X is apprximately equal t P X K. It might be cnjectured that fr small divergences frm X=P- L K the relatin f Cv(&)* t Cv(i) is given apprximately by the upper bund in Table i. Therefre in many ecnmic applicatins the incrrect estimate Cv(fe*) will be larger than the cvariance f the riginal b. Thus even if the riginal estimatr des nt suffer frm multicllinearity the incrrect estimate b* will appear t d s. TABLE I I Cv(Z>*) Cv(Z>*) k 1 2 3 4 «1 3 Upper bund Lwer bund Upper bund Lwer bund 1-2 -25 1-6 xi 2-25 26-7 -71 6-5 XIO 3-63 32-4 -22 1-2 XIO 5 6 23-5 -9 1-2 X IO E -4 k = number f independent variables, k des nt include the cnstant term. Fr n = 3, the figures fr the Upper Bund have been runded. First Order Serial Crrelatin Fr cmpleteness we cnclude by briefly describing anther use fr which the first difference transfrmatin has been prpsed. In many applicatins f the regressin mdel, particularly thse using time series data it is suspected that the disturbance term u fllws a first rder autregressive prcess. That is u t pu,_ 1+ t where the E t are independent identically distributed errr terms.
Premultiplying equatin (i) by the n ixn matrix Q crrespnds t the generalised first difference transfrmatin and regularises the errr term. Thus Qy=QXp+Qu (6) where Q = -p, I... p I pi O... p It may easily be shwn that E(Q u u'q')=a 2 L_ 1. Unlike the prblem f trend r multicllinearity, the ipurpse f the transfrmatin is nw t deal with the irregular errr term. i Kadiyala [1968] discussed the efficiency f the least squares estimatr btained using the transfrmed data Qy and QX relative t the rdinary least squares estimatr. Fr the special case f X being a clumn f nes the estimatr btained frm (6) was always less efficient than the least squares estimatr and as p apprached ne the efficiency drpped t zer. It is fr the values f p clse t ne that the first difference transfrmatin has been recmmended fr dealing with first rder serial crrelatin. A better prcedure is t use a transfrmatin Q*, where Q* = / q \ and<j=(vi P 2 >,..., ). If p is unknwn it can be estimated frm the rdinary least squares residuals, fr example Jhnstn [1972]. Cnclusin The first difference transfrmatin is nt an apprpriate way f dealing with trend, n. multicllinearity in regressin analysis. Transfrming the data in this way cannt increase the efficiency f the regressin estimates and will in general reduce the efficiency. Fr first rder serial 1 crrelatin the transfrmatin Q* is superir t the generalised first difference transfrmatin Q. I wuld like t thank T. Muench fr his helpful cmments n an earlier draft f this paper. University f Massachusetts, Bstn.
REFERENCES, [i] Andersn, T. W. (1971), The Statistical Analysis f Time Series, J. Wiley, New Yrk. [2] Geary, R. C. (1972), "Tw Exercises in Simple Regressin", Ecnmic and Scial Review, Vl. 3, N. 4, July, 1972- [3] Jhnstn, J. (1972), Ecnmetric Methds, 2nd editin, McGraw Hill, New Yrk. [4] Kadiyala, K. R. (1968), "A Transfrmatin Used t Circumvent the Prblem f Autcrrelatin", Ecnmetrica, 36, 93-96. [5] Kane, E. J. (1968), Ecnmic Statistics and Ecnmetrics, Harper and Rw. APPENDIX 1 The k x k cvariance matrics f h, js* and b* are given by cf 2 \, cr 2 ]T/5* and cr 2^' respectively. The inequality between the cvariances is t be understd in the fllwing way. If A and B are symmetric psitive definite matrices f the same rder then A^B, if and nly if A B is psitive semi-definite. The Generalised Gauss Markv therem states that fr the regressin mdel (1), b has minimum variance amng all unbiased linear estimatrs f j8. Since j3* is anther unbiased linear estimatr f 3 it fllws that Cv(&)<Cv(j8*). Similarly fr mdel (3),/?* has minimum variance amng all unbiased linear estimatr f /3 where the data is f the frm y*, X*. But b* is als an unbiased linear estimatr f/3 hence Cv$*)<Cv(fc*)