Non-orthogonal Hilbert Projections in Trend Regression Peter C. B. Phillips Cowles Foundation for Research in Economics Yale University, University of Auckland & University of York and Yixiao Sun Department of Economics Yale University 8February2 Abstract We consider detrending procedures in continuous time when the errors follow a Brownian motion and a diffusion process, respectively. We show that more efficient trend extraction is accomplished by non-orthogonal Hilbert projections in both cases. JEL ClassiÞcation:C22 Key words and phrases: Efficent Detrending, Brownian Motion, Diffusion Process, Hilbert Projection. Phillips thanks the NSF for research support under Grant No. SBR 97-3295. Sun thanks the Cowles Foundation for support under a Cowles Prize Fellowship. The paper was typed by the authors in SW2.5.
. Problem An observed continuous time process X(t) is generated by the linear system X(t) β Z(t)+W (t),t [, ] () where W (t) is an unobservable standard Brownian motion, Z(t) (t,..., t p ) is a time polynomial vector and β is a parameter vector to be estimated. The following two estimators of β are proposed: ˆβ Z (t) Z (t) dt Z (t) X(t)dt, and β Z () (t) Z () (t) dt Z () (t) dx(t), where Z () is the vector of the Þrst derivatives of Z. Part A. Show that both ˆβ Z(t) and β Z(t) are Hilbert projections in L 2 [, ]. How do these projections differ? 2. Find the distributions of ˆβ and β and compare them in the case where p. What do you conclude? Part B Suppose the system generating X(t) is X(t) β Z(t)+J c (t), t [, ] (2) where J c (t) R t e(t s)c dw (s) for some known constant c<, is a linear diffusion, or Orstein-Uhlenbeck process.. Are ˆβ Z(t) and β Z(t) still Hilbert projections? 2. Calculate the distributions of ˆβ and β and compare them in the case where p. What do you conclude? 3. Taking c to be known, can you suggest an unbiased linear estimator of β which has smaller variance than ˆβ and β? Does it correspond to another Hilbert projection? 2
2. Solution Part A: Brownian Motion Case DeÞne the operator P as µz PX(t) β b Z(t) X(s)Z(s) ds Z(s)Z(s) ds Z(t), then P 2 X PX(s)Z(s) ds Z(s)Z(s) ds Z(t) " Z # µz XZ ds ZZ ds ZZ ds ZZ ds Z(t) X(s)Z(s) ds Z(s)Z(s) ds Z(t) PX. In addition, it is easy to show that (PX,Y)(PY,X). Thus P is an orthogonal Hilbert projector onto the manifold spanned by Z(t). DeÞne the operator Q as µz QX β e Z dx(s)z () (s) Z () (s)z () (s) ds Z(t). Then Q is idempotent because Q 2 X d[qx]z () (s) Z () (s)z () (s) ds Z(t) dx(s)z () (s) Z () (s)z () (s) ds dz(s)z () (s) Z () (s)z () (s) ds Z(t) dx(s)z () (s) Z () (s)z () (s) ds Z(t) QX. But Q is not orthogonal since (QX, Y ) dx(s)z () (s) ds Z () (s)z () (s) ds Z(s)Y (s)ds, 3
while (X, QY ) dy (s)z () (s) ds Z () (s)z () (s) ds Z(s)X(s)ds and so (QX, Y ) 6 (X, QY ) in general. Therefore both PX and QX are Hilbert projections of X(t) onto the space spanned by Z(t). The difference lies in that PX is an orthogonal projection while QX is a non-orthogonal projection. We now investigate the statistical properties of these two projections. First consider the distribution of β. b Since bβ β Z(s)Z(s) ds Z(s)W (s)ds, is a linear functional of a Brownian motion, we know that β b β N(,V ), where V Z(s)Z(s) ds Z(t)(s t)z(s) dtds Z(s)Z(s) ds. Next consider the distribution of β. e Since eβ β Z () (s)z () (s) ds Z () (s)dw(s), is also a linear functional of a Brownian motion, we know immediately that β e β N(,V 2 ), where V 2 Z () (s)z () (s) ds. We now compare the distributions of ˆβ and β when p. In this case V 6 5 and V 2 R dr. Therefore β e is a more efficient estimator than β b and the efficiency gain is 2%. Remark Note that ˆβ Z(t) is the best L 2 [, ] functional approximation to β Z(t) and it is delivered by an orthogonal projection of X(t) onto the manifold spanned by Z(t). On the other hand, β Z(t) is the better (more efficient) statistical estimator of β Z(t) and it is delivered by a non-orthogonal projection onto the same space. 4
Part B: The Diffusion Process Case Both β b Z(t) and β e Z(t) are still Hilbert projections. The proofs are essentially the same as before and omitted. We proceed to compute the distributions of ˆβ and β. First bβ β Z(s)Z(s) ds Z(t)J c (t)dt N(,V 3 ), where V 3 Z(s)Z(s) ds Var Z(t)J c (t)dt Z(s)Z(s) ds ½ ¾ ZZ ds Z(r)Λ r,s Z(s) ds dr ZZ ds. and Λ r,s EJ c (r)j c (s) e (r+s)c e r s c / (2c) is the covariance kernel of the O-U process. Next, the error process J c (t) R t e(t s)c dw(s) in (2) satisþes the linear differential equation dj c (t) cj c (t)dt + dw (t), so we can write β e β as µ eβ β Z () (s)z () (s) ds Z () ()J c () where the equality follows from integration by parts and Z (2) (s)j c (s)ds N(,V 4 ), V 4 Z () Z () +Z () ()Λ, Z () () 2 Z (2) (r)λ r,s Z (2) (s) dsdr Z () ()Λ,s Z (2) (s) ds Z () Z (). When p, some algebra gives V 3 3 [2c 3 +3c 2 3+3(c) 2 e 2c ] and 2c 5 V 4 Λ, e2c. So V 2c 3 /V 4 3[2c 3 +3c 2 3+3(c) 2 e 2c ]/[c 4 (e 2c )]. Figure graphstherelativeefficiency of β e and β,i.e.v b 3 /V 4 against c. It reveals that β b is more efficient than β e for small values of c. As c approaches zero, β e becomes more efficient, which is consistent with our results in Part A. As c approaches zero from below, the linear diffusion J c behaves like the underlying Brownian motion and the estimator delivered by the non-orthogonal projection is more efficient. The cutoff value of c is approximately 3.8. Whenc< 3.8,V 3 <V 4 ; When 3.8 c<, V 3 V 4. 5
.3 2.2.2 2.8..6.4-5 -4-3 c -2 Figure -.9 - -8-6 c -4 Figure 2-2.2 AMoreEfficient Estimator We now propose another unbiased estimator β, which is more efficient than both ˆβ and β. DeÞne Z c (t) Z () (t) cz(t) and β Z c (t)z c (t) dt Z c (t)dx (t) c Z c (t)x (t) dt. (3) Recall that dj c (t) cj c (t)dt + dw (t), so we may linearly transform the model (2) to dx(t) cx (t) dt β Z () (t) cz (t) dt + dw (t),t [, ], (4) in which the innovations dw (t) are independent. The estimator β in (3) is the least squares estimator of β in the system (4). It follows from (3) and (4) that β β Z c (t)z c (t) dt Z c (t)dw (t) N(,V 5 ), where V 5 Z c (t)z c (t) dt. ³ R (ct )2 dt 3 When p, V 5. Figure 2 graphs V c 2 3c+3 3/V 5 and V 4 /V 5 against c. It shows that β is indeed more efficient than both β b and β e for all values of c<. DeÞne the operator M as MX dx (t) Z c (t) c Then, since Z (t) is continuously differentiable dz (t) Z c (t) c X (t) Z c (t) dt Z c (t)z c (t) dt Z(t). (5) Z (t) Z c (t) dt 6 Z c (t)z c (t) dt,
we have MZ dz (t) Z c (t) c and by virtue of the deþnition (5) M 2 X MMX dx (t) Z c (t) c dx (t) Z c (t) c MX. Z (t) Z c (t) dt Z c (t)z c (t) dt Z(t) Z (t) X (t) Z c (t) dt Z c (t)z c (t) dt MZ(t) X (t) Z c (t) dt Z c (t)z c (t) dt Z(t) So M is a Hilbert projection. In addition, it is easy to see that (MX,Y) 6 (X, MY ) in general. Therefore, β Z(t) MX is a non-orthogonal Hilbert projection. However it delivers the minimum variance unbiased trend estimator in this case. Remarks (a) The transformed system (4) is the continuous time version of the traditional Cochrane-Orcutt transformation of a discrete time regression model to remove autoregressive errors. (b) The results obtained here clearly apply for any differentiable deterministic sequence Z (t) for which R ZZ dt, R Z() Z (), and R Z cz c dt are positive deþnite. 7