EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

Similar documents
Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

The Statistical Property of Ordinary Least Squares

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

TOPICS IN ECONOMETRICS 2003: ASSOCIATE STUDENTS 1. CONDITIONAL EXPECTATIONS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Advanced Quantitative Methods: ordinary least squares

The Algebra of the Kronecker Product. Consider the matrix equation Y = AXB where

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Reference: Davidson and MacKinnon Ch 2. In particular page

Statistics 910, #5 1. Regression Methods

Chapter 5 Matrix Approach to Simple Linear Regression

Lecture 3: Multiple Regression

Advanced Econometrics I

Linear Models Review

IDENTIFICATION OF ARMA MODELS

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

ECON The Simple Regression Model

Linear Regression. Junhui Qian. October 27, 2014

Simple Linear Regression Model & Introduction to. OLS Estimation

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

The Multiple Regression Model Estimation

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Review of Econometrics

Properties of Matrices and Operations on Matrices

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

The Geometry of Linear Regression

Regression: Lecture 2

[y i α βx i ] 2 (2) Q = i=1

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

FIRST MIDTERM EXAM ECON 7801 SPRING 2001

Specification errors in linear regression models

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Homoskedasticity. Var (u X) = σ 2. (23)

Econometrics. Andrés M. Alonso. Unit 1: Introduction: The regression model. Unit 2: Estimation principles. Unit 3: Hypothesis testing principles.

Instrumental Variables

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Econometrics of Panel Data

Regression and Statistical Inference

Lecture 1: OLS derivations and inference

Multivariate Regression Analysis

Non-Spherical Errors

Simple Linear Regression: The Model

ECON 3150/4150, Spring term Lecture 7

Introduction to Estimation Methods for Time Series models. Lecture 1

MS&E 226: Small Data

Sensitivity of GLS estimators in random effects models

Ch 2: Simple Linear Regression

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

MIT Spring 2015

The Linear Regression Model

Multivariate Regression

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

3 Multiple Linear Regression

Maximum Likelihood Estimation

Lecture 6: Geometry of OLS Estimation of Linear Regession

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

A matrix over a field F is a rectangular array of elements from F. The symbol

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

. a m1 a mn. a 1 a 2 a = a n

Regression #4: Properties of OLS Estimator (Part 2)

Lecture 14 Simple Linear Regression

E2 212: Matrix Theory (Fall 2010) Solutions to Test - 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Multiple Regression Analysis. Part III. Multiple Regression Analysis

STAT 350: Geometry of Least Squares

Ch 3: Multiple Linear Regression

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Lecture 11: Regression Methods I (Linear Regression)

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

STAT 100C: Linear models

4 Multiple Linear Regression

Properties of the least squares estimates

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Instrumental Variables, Simultaneous and Systems of Equations

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Simple Linear Regression

Heteroskedasticity and Autocorrelation

Large Sample Properties of Estimators in the Classical Linear Regression Model

Empirical Economic Research, Part II

Applied Econometrics (QEM)

Quick Review on Linear Multiple Regression

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N

Lecture 11: Regression Methods I (Linear Regression)

Multiple Regression Model: I

The regression model with one fixed regressor cont d

The Standard Linear Model: Hypothesis Testing

Serially Correlated Regression Disturbances

The outline for Unit 3

Regression diagnostics

7 : APPENDIX. Vectors and Matrices

ELEMENTARY LINEAR ALGEBRA

Time Series Analysis

x = 1 n (x = 1 (x n 1 ι(ι ι) 1 ι x) (x ι(ι ι) 1 ι x) = 1

Transcription:

THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation (1) y = β 0 + β 1 x 1 + + β k x k + ε, which can be written in the following form: (2) y 1 y 2.. y T = 1 x 11... x 1k 1 x 21... x 2k... 1 x T 1... x Tk β 0 β 1.. β k + ε 1 ε 2.. ε T. This can be represented in summary notation by (3) y = Xβ + ε. The object is to derive an expression for the ordinary least-squares estimates of the elements of the parameter vector β =[β 0,β 1,...,β k ]. 1

The ordinary least-squares (OLS) estimate of β is the value that minimises (4) S(β) =ε ε =(y Xβ) (y Xβ) = y y y Xβ β X y + β X Xβ = y y 2y Xβ + β X Xβ. According to the rules of matrix differentiation, the derivative is (5) S β = 2y X +2β X X. Setting this to zero gives 0 = β X X y X, which is transposed to provide the so-called normal equations: (6) X Xβ = X y. On the assumption that the inverse matrix exists, the equations have a unique solution, which is the vector of ordinary least-squares estimates: (7) ˆβ =(X X) 1 X y. 2

The Decomposition of the Sum of Squares The equation y = X ˆβ + e, decomposes y into a regression component X ˆβ and a residual component e = y ˆXβ. These are mutually orthogonal, since (6) indicates that X (y ˆXβ)=0. Define the projection matrix P = X(X X) 1 X, which is symmetric and idempotent such that P = P = P 2 or, equivalently, P (I P )=0. Then, X ˆβ = Py and e = y ˆXβ =(I P )y, and, therefore, the regression decomposition is y = Py+(I P )y. The conditions on P imply that (8) y y = { Py+(I P )y } { Py+(I P )y } = y Py+ y (I P )y = ˆβ X X ˆβ + e e. 3

This is an instance of Pythagorus theorem; and the equation indicates that the total sum of squares y y is equal to the regression sum of squares ˆβ X X ˆβ plus the residual or error sum of squares e e. By projecting y perpendicularly onto the manifold of X, the distance between y and Py = X ˆβ is minimised. Proof. Let γ = Pg be an arbitrary vector in the manifold of X. Then (9) (y γ) (y γ) = { (y X ˆβ)+(X ˆβ γ) } { (y X ˆβ)+(X ˆβ γ) } = { (I P )y + P (y g) } { (I P )y + P (y g) }. The properties of P indicate that (10) (y γ) (y γ) =y (I P )y +(y g) P (y g) = e e +(X ˆβ γ) (X ˆβ γ). Since the squared distance (X ˆβ γ) (X ˆβ γ) is nonnegative, it follows that (y γ) (y γ) e e, where e = y X ˆβ; which proves the assertion. 4

The Coefficient of Determination A summary measure of the extent to which the ordinary least-squares regression accounts for the observed vector y is provided by the coefficient of determination. This is defined by (11) R 2 = ˆβ X X ˆβ y y = y Py y y. The measure is just the square of the cosine of the angle between the vectors y and Py = X ˆβ; and the inequality 0 R 2 1 follows from the fact that the cosine of any angle must lie between 1 and +1. If X is a square matrix of full rank, with as many regressors as observations, then X 1 exists and P = X(X X) 1 X = X{X 1 X 1 }X = I, and so R 2 =1. IfX y = 0, then, Py = 0 and R 2 = 0. But, if y is distibuted continuously, then this event has a zero probability. 5

e y Xβ^ γ Figure 1. The vector Py = X ˆβ is formed by the orthogonal projection of the vector y onto the subspace spanned by the columns of the matrix X. 6

The Partitioned Regression Model Consider partitioning the regression equation of (3) to give [ ] β1 (12) y =[X 1 X 2 ] + ε = X 1 β 1 + X 2 β 2 + ε, β 2 where [X 1,X 2 ]=X and [β 1,β 2] = β. The normal equations of (6) can be partitioned likewise: (13) (14) X 1X 1 β 1 + X 1X 2 β 2 = X 1y, X 2X 1 β 1 + X 2X 2 β 2 = X 2y. From (13), we get the X 1X 1 β 1 = X 1(y X 2 β 2 ), which gives (15) ˆβ1 =(X 1X 1 ) 1 X 1(y X 2 ˆβ2 ). To obtain an expression for ˆβ 2, we must eliminate β 1 from equation (14). For this, we multiply equation (13) by X 2X 1 (X 1X 1 ) 1 to give (16) X 2X 1 β 1 + X 2X 1 (X 1X 1 ) 1 X 1X 2 β 2 = X 2X 1 (X 1X 1 ) 1 X 1y. 7

From (14) X 2X 1 β 1 + X 2X 2 β 2 = X 2y, we take the resulting equation (16) X 2X 1 β 1 + X 2X 1 (X 1X 1 ) 1 X 1X 2 β 2 = X 2X 1 (X 1X 1 ) 1 X 1y to give } (17) {X 2X 2 X 2X 1 (X 1X 1 ) 1 X 1X 2 β 2 = X 2y X 2X 1 (X 1X 1 ) 1 X 1y. On defining P 1 = X 1 (X 1X 1 ) 1 X 1, equation (17) can be written as } (19) {X 2(I P 1 )X 2 β 2 = X 2(I P 1 )y, whence (20) ˆβ2 = { X 2(I P 1 )X 2 } 1X 2 (I P 1 )y. 8

The Regression Model with an Intercept Consider again the equations (22) y = ια + Zβ z + ε. where ι = [1, 1,...,1] is the summation vector and Z = [x tj ], with t =1,...T and j =1,...,k, is the matrix of the explanatory variables. This is a case of the partitioned regression equation of (12). By setting X 1 = ι and X 2 = Z and by taking β 1 = α, β 2 = β z, the equations (15) and (20), give the following estimates of the α and β z : (23) ˆα =(ι ι) 1 ι (y Z ˆβ z ), and (24) ˆβ z = { Z (I P ι )Z } 1 Z (I P ι )y, P ι = ι(ι ι) 1 ι = 1 T ιι. with 9

To understand the effect of the operator P ι, consider (25) ι y = T y t, t=1 (ι ι) 1 ι y = 1 T T y t =ȳ, t=1 and P ι y = ιȳ = ι(ι ι) 1 ι y =[ȳ, ȳ,...,ȳ]. Here, P ι y =[ȳ, ȳ,...,ȳ] is a column vector containing T repetitions of the sample mean. From the above, it can be understood that, if x =[x 1,x 2,...x T ] is vector of T elements, then (26) x (I P ι )x = T x t (x t x) = T (x t x)x t = T (x t x) 2. t=1 t=1 t=1 The final equality depends on the fact that (x t x) x = x (x t x) =0. 10

The Regression Model in Deviation Form Consider the matrix of cross-products in equation (24). This is (27) Z (I P ι )Z = {(I P ι )Z} {Z(I P ι )} =(Z Z) (Z Z). Here, Z contains the sample means of the k explanatory variables repeated T times. The matrix (I P ι )Z =(Z Z) contains the deviations of the data points about the sample means. The vector (I P ι )y =(y ιȳ) may be described likewise. It follows that the estimate ˆβ z = { Z (I P ι )Z } 1 Z (I P ι )y is obtained by applying the least-squares regression to the equation (28) y 1 ȳ y 2 ȳ. y T ȳ = x 11 x 1... x 1k x k x 21 x 1... x 2k x k.. x T 1 x 1... x Tk x k β 1. β k + ε 1 ε ε 2 ε. ε T ε, which lacks an intercept term. 11

In summary notation, the equation may be denoted by (29) y ιȳ =[Z Z]β z +(ε ε). Observe that it is unnecessary to take the deviations of y. The result is the same whether we regress y or y ιȳ on [Z Z]. The result is due to the symmetry and idempotency of the operator (I P ι ), whereby Z (I P ι )y = {(I P ι )Z} {(I P ι )y}. Once the value for ˆβ z is available, the estimate for the intercept term can be recovered from the equation (23), which can be written as (30) ˆα =ȳ Z ˆβ z =ȳ k j=1 x j ˆβj. 12

The Assumptions of the Classical Linear Model Consider the regression equation (32) y = Xβ + ε, where y =[y 1,y 2,...,y T ], ε =[ε 1,ε 2,...,ε T ], β =[β 0,β 1,...,β k ] and X =[x tj ], with x t0 = 1 for all t. It is assumed that the disturbances have expected values of zero. Thus (33) E(ε) = 0 or, equivalently, E(ε t )=0, t =1,...,T. Next, it is assumed that they are mutually uncorrelated and that they have a common variance. Thus (34) D(ε) =E(εε )=σ 2 I, or E(ε t ε s )= { σ 2, if t = s; 0, if t s. If t is a temporal index, then these assumptions imply that there is no inter-temporal correlation in the sequence of disturbances. 13

A conventional assumption, borrowed from the experimental sciences, is that X is a nonstochastic matrix with linearly independent columns. Linear independence is necessary in order to distinguish the separate effects of the k explanatory variables. In econometrics, it is more appropriate to regard the elements of X as random variables distributed independently of the disturbances: (37) E(X ε X) =X E(ε) =0. Then, (38) ˆβ =(X X) 1 X y is unbiased such that E( ˆβ) =β. To demonstrate this, we may write (39) ˆβ =(X X) 1 X y =(X X) 1 X (Xβ + ε) = β +(X X) 1 X ε. Taking expectations gives (40) E( ˆβ) =β +(X X) 1 X E(ε) = β. 14

Notice that, in the light of this result, equation (39) now indicates that (41) ˆβ E( ˆβ) =(X X) 1 X ε. The variance covariance matrix of the ordinary least-squares regression estimator is D( ˆβ) =σ 2 (X X) 1. This is demonstrated via the following sequence of identities: { [ ][ ] } E ˆβ E( ˆβ) ˆβ E( ˆβ) = E { (X X) 1 X εε X(X X) 1} (43) =(X X) 1 X E(εε )X(X X) 1 =(X X) 1 X {σ 2 I}X(X X) 1 = σ 2 (X X) 1. The second of these equalities follows directly from equation (41). 15

Matrix Traces EC3062 ECONOMETRICS If A =[a ij ] is a square matrix, then Trace(A) = n i=1 a ii. If A =[a ij ] is of order n m and B =[b kl ] is of order m n, then m AB = C =[c il ] with c il = a ij b jl and (45) Now, (46) BA = D =[d kj ] with d kj = Trace(AB) = Trace(BA) = n i=1 m j=1 m a ij b ji j=1 n b jl a lj = l=1 j=1 n b kl a lj. l=1 and n l=1 m a lj b jl. Apart from a change of notation, where l replaces i, the expressions on the RHS are the same. It follows that Trace(AB) = Trace(BA). For three factors A, B, C, we have Trace(ABC) = Trace(CAB) = Trace(BCA). 16 j=1

Estimating the Variance of the Disturbance It is natural to estimate σ 2 = V (ε t ) via its empirical counterpart. With e t = y t x t. ˆβ in place of εt, it follows that T 1 t e2 t may be used to estimate σ 2. However, it transpires that this is biased. An unbiased estimate is provided by (48) ˆσ 2 = 1 T k T t=1 e 2 t = 1 T k (y X ˆβ) (y X ˆβ). The unbiasedness of this estimate may be demonstrated by finding the expected value of (y X ˆβ) (y X ˆβ) =y (I P )y. Given that (I P )y =(I P )(Xβ + ε) =(I P )ε in consequence of the condition (I P )X = 0, it follows that (49) E { (y X ˆβ) (y X ˆβ) } = E(ε ε) E(ε Pε). 17

The value of the first term on the RHS is given by (50) E(ε ε)= T E(e 2 t )=Tσ 2. t=1 The value of the second term on the RHS is given by (51) E(ε Pε) = Trace { E(ε Pε) } = E { Trace(ε Pε) } = E { Trace(εε P ) } = Trace { E(εε )P } = Trace { σ 2 P } = σ 2 Trace(P ) = σ 2 k. The final equality follows from the fact that Trace(P ) = Trace(I k )=k. Putting the results of (50) and (51) into (49), gives (52) E { (y X ˆβ) (y X ˆβ) } = σ 2 (T k); and, from this, the unbiasedness of the estimator in (48) follows directly. 18

Statistical Properties of the OLS Estimator The expectation or mean vector of ˆβ, and its dispersion matrix as well, may be found from the expression (53) ˆβ =(X X) 1 X (Xβ + ε) = β +(X X) 1 X ε. The expectation is (54) E( ˆβ) =β +(X X) 1 X E(ε) = β. Thus, ˆβ is an unbiased estimator. The deviation of ˆβ from its expected value is ˆβ E( ˆβ) =(X X) 1 X ε. Therefore, the dispersion matrix, which contains the variances and covariances of the elements of ˆβ, is D( ˆβ) [ { }{ } ] =E ˆβ E( ˆβ) ˆβ E( ˆβ) (55) =(X X) 1 X E(εε )X(X X) 1 = σ 2 (X X) 1. 19

The Gauss Markov theorem asserts that ˆβ is the unbiased linear estimator of least dispersion. Thus, (56) If ˆβ is the OLS estimator of β, and if β is any other linear unbiased estimator of β, then V (q β ) V (q ˆβ), where q is a constant vector. Proof. Since β = Ay is an unbiased estimator, it follows that E(β )= AE(y) = AXβ = β, which implies that AX = I. Now write A = (X X) 1 X + G. Then, AX = I implies that GX = 0. It follows that (57) D(β )=AD(y)A = σ 2{ (X X) 1 X + G }{ X(X X) 1 + G } = σ 2 (X X) 1 + σ 2 GG = D( ˆβ)+σ 2 GG. Therefore, for any constant vector q of order k, there is (58) V (q β )=q D( ˆβ)q + σ 2 q GG q q D( ˆβ)q = V (q ˆβ); and thus the inequality V (q β ) V (q ˆβ) is established. 20

Orthogonality and Omitted-Variables Bias Consider the partitioned regression model of equation (12), which was written as [ ] β1 (59) y =[X 1,X 2 ] + ε = X 1 β 1 + X 2 β 2 + ε. β 2 Imagine that the columns of X 1 are orthogonal to the columns of X 2 such that X 1X 2 =0. In the partitioned form of the formula ˆβ =(X X) 1 X y, there would be [ ] [ ] [ ] (60) X X X = 1 X [ X 1 X 2 ]= 1 X 1 X 1X 2 X X 2X 1 X 2X = 1 X 1 0 2 0 X 2X, 2 X 2 where the final equality follows from the condition of orthogonality. The inverse of the partitioned form of X X in the case of X 1X 2 =0is [ ] (61) (X X) 1 X 1 [ ] = 1 X 1 0 (X 0 X 2X = 1 X 1 ) 1 0 2 0 (X 2X 2 ) 1. 21

There is also (62) X y = [ X 1 X 2 ] y = [ X 1 y X 2y ]. On combining these elements, we find that [ ] [ ˆβ1 (X 1 X 1 ) 1 ][ 0 X 1 y (63) = ˆβ 2 0 (X 2X 2 ) 1 X 2y ] = [ (X 1 X 1 ) 1 X 1y (X 2X 2 ) 1 X 2y ]. In this case, the coefficients of the regression of y on X =[X 1,X 2 ] can be obtained from the separate regressions of y on X 1 and y on X 2. It should be recognised that this result does not hold true in general. The general formulae for ˆβ 1 and ˆβ 2 are those that have been given already under (15) and (20): (64) ˆβ 1 =(X 1X 1 ) 1 X 1(y X 2 ˆβ2 ), ˆβ 2 = { X 2(I P 1 )X 2 } 1X 2 (I P 1 )y, P 1 = X 1 (X 1X 1 ) 1 X 1. 22

The purpose of including X 2 in the regression equation, when our interest is confined to the parameters of β 1, is to avoid falsely attributing the explanatory power of the variables of X 2 to those of X 1. If X 2 is erroneously excluded, then the estimate of β 1 will be β 1 =(X 1X 1 ) 1 X 1y (65) =(X 1X 1 ) 1 X 1(X 1 β 1 + X 2 β 2 + ε) = β 1 +(X 1X 1 ) 1 X 1X 2 β 2 +(X 1X 1 ) 1 X 1ε. On applying the expectations operator, we find that (66) E( β 1 )=β 1 +(X 1X 1 ) 1 X 1X 2 β 2, since E{(X 1X 1 ) 1 X 1ε} =(X 1X 1 ) 1 X 1E(ε) = 0. Thus, in general, we have E( β 1 ) β 1, which is to say that β 1 is a biased estimator. The estimator will be unbiased only when either X 1X 2 =0orβ 2 =0. In other circumstances, it will suffer from omitted-variables bias. 23

Restricted Least-Squares Regression A set of j linear restrictions on the vector β can be written as Rβ = r, where r is a j k matrix of linearly independent rows, such that Rank(R) = j, and r is a vector of j elements. To combine this a priori information with the sample information, the sum of squares (y Xβ) (y Xβ) is minimised subject to Rβ = r. This leads to the Lagrangean function (67) L =(y Xβ) (y Xβ)+2λ (Rβ r) = y y 2y Xβ + β X Xβ +2λ Rβ 2λ r. Differentiating L with respect to β and setting the result to zero, gives following first-order condition L/ β = 0: (68) 2β X X 2y X +2λ R =0. After transposing the expression, eliminating the factor 2 and rearranging, we have (69) X Xβ + R λ = X y. 24

Combining these equations with the restrictions gives [ ][ ] [ ] X (70) X R β X = y. R 0 λ r For the system to given a unique value of β, the matrix X X need not be invertible it is enough that the condition [ ] X (71) Rank = k R should hold, which means that the matrix should have full column rank. Consider applying OLS to the equation [ ] [ ] [ ] y X ε (72) = β +, r R 0 which puts the equations of the observations and the equations of the restrictions on an equal footing. An estimator exits on the condition that (X X + R R) 1 exists, for which the satisfaction of the rank condition is necessary and sufficient. Then, ˆβ =(X X + R R) 1 (X y + R r). 25

Let us assume that (X X) 1 does exist. expression for β in the form of Then equation (68) gives an (73) β =(X X) 1 X y (X X) 1 R λ = ˆβ (X X) 1 R λ, where ˆβ is the unrestricted ordinary least-squares estimator. Since Rβ = r, premultiplying the equation by R gives (74) r = R ˆβ R(X X) 1 R λ, from which (75) λ = {R(X X) 1 R } 1 (R ˆβ r). On substituting this expression back into equation (73), we get (76) β = ˆβ (X X) 1 R {R(X X) 1 R } 1 (R ˆβ r). This formula is an instance of the prediction-error algorithm, whereby the estimate of β is updated using information provided by the restrictions. 26

Given that E( ˆβ β) = 0, which is to say that ˆβ is an unbiased estimator, then, on the supposition that the restrictions are valid, it follows that E(β β) = 0, so that β is also unbiased. Next, consider the expression β β =[I (X X) 1 R {R(X X) 1 R } 1 R]( ˆβ β) (77) =(I P R )( ˆβ β), where (78) P R =(X X) 1 R {R(X X) 1 R } 1 R. The expression comes from taking β from both sides of (76) and from recognising that R ˆβ r = R( ˆβ β). It can be seen that P R is an idempotent matrix that is subject to the conditions that (79) P R = PR, 2 P R (I P R )=0 and P RX X(I P R )=0. From equation (77), it can be deduced that (80) D(β )=(I P R )E{( ˆβ β)( ˆβ β) }(I P R ) = σ 2 (I P R )(X X) 1 (I P R ) = σ 2 [(X X) 1 (X X) 1 R {R(X X) 1 R } 1 R(X X) 1 ]. 27

Regressions on Trigonometrical Functions An example of orthogonal regressors is a Fourier analysis, where the explanatory variables are sampled from a set of trigonometric functions with angular velocities, called Fourier frequencies, that are evenly distributed in an interval from zero to π radians per sample period. If the sample is indexed by t = 0, 1,...,T 1, then the Fourier frequencies are ω j =2πj/T; j =0, 1,...,[T/2], where [T/2] denotes the integer quotient of the division of T by 2. The object of a Fourier analysis is to express the elements of the sample as a weighted sum of sine and cosine functions as follows: (81) y t = α 0 + [T/2] j=1 {α j cos(ω j t)+β j sin(ω j t)} ; t =0, 1,...,T 1. The vectors of the generic trigonometric regressors may be denoted by (83) c j =[c 0j,c 1j,...c T 1,j ] and s j =[s 0j,s 1j,...s T 1,j ], where c tj = cos(ω j t) and s tj = sin(ω j t). 28

The vectors of the ordinates of functions of different frequencies are mutually orthogonal. Therefore, the following orthogonality conditions hold: (84) c ic j = s is j =0 if i j, and c is j = 0 for all i, j. In addition, there are some sums of squares which can be taken into account in computing the coefficients of the Fourier decomposition: (85) c 0c 0 = ι ι = T, s 0s 0 =0, c jc j = s js j = T/2 for j =1,...,[(T 1)/2] When T =2n, there is ω n = π and there is also (86) s ns n =0, and c nc n = T. 29

The regression formulae for the Fourier coefficients can now be given. First, there is (87) α 0 =(ι ι) 1 ι y = 1 T Then, for j =1,...,[(T 1)/2], there are (88) α j =(c jc j ) 1 c jy = 2 T y t =ȳ. t y t cos ω j t, t and (89) β j =(s js j ) 1 s jy = 2 T y t sin ω j t. t If T =2n is even, then there is no coefficient β n and there is (90) α n =(c nc n ) 1 c ny = 1 T ( 1) t y t. t 30

By pursuing the analogy of multiple regression, it can be seen, in view of the orthogonality relationships, that there is a complete decomposition of the sum of squares of the elements of the vector y: (91) y y = α 2 0ι ι + [T/2] j=1 { α 2 j c jc j + β 2 j s js j }. Now consider writing α0ι 2 ι = ȳ 2 ι ι = ȳ ȳ, where ȳ =[ȳ, ȳ,...,ȳ] isa vector whose repeated element is the sample mean ȳ. It follows that y y α0ι 2 ι = y y ȳ ȳ =(y ȳ) (y ȳ). Then, in the case where T =2n is even, the equation can be written as (92) (y ȳ) (y ȳ) = T 2 n 1 j=1 { α 2 j + βj 2 } + Tα 2 n = T 2 n ρ 2 j. j=1 where ρ j = α 2 j +β2 j for j =1,...,n 1 and ρ n =2α n. A similar expression exists when T is odd, with the exceptions that α n is missing and that the summation runs to (T 1)/2. 31

It follows that the variance of the sample can be expressed as (93) 1 T T 1 (y t ȳ) 2 = 1 2 t=0 n (αj 2 + βj 2 ). j=1 The proportion of the variance that is attributable to the component at frequency ω j is (α 2 j + β2 j )/2 =ρ2 j /2, where ρ j is the amplitude of the component. The number of the Fourier frequencies increases at the same rate as the sample size T, and, if there are no regular harmonic components in the underling process, then we can expect the proportion of the variance attributed to the individual frequencies to decline as the sample size increases. If there is a regular component, then we can expect the the variance attributable to it to converge to a finite value as the sample size increases. In order provide a graphical representation of the decomposition of the sample variance, we must scale the elements of equation (36) by a factor of T. The graph of the function I(ω j )=(T/2)(α 2 j + β2 j )isknowas the periodogram. 32

5.4 5.2 5 4.8 0 25 50 75 100 125 Figure 2. The plot of 132 monthly observations on the U.S. money supply, beginning in January 1960. A quadratic function has been interpolated through the data. 33

0.015 0.01 0.005 0 0 π/4 π/2 3π/4 π Figure 3. The periodogram of the residuals of the logarithmic moneysupply data. 34