arxiv: v1 [stat.me] 3 Nov 2015

Size: px

Start display at page:

Download "arxiv: v1 [stat.me] 3 Nov 2015"

Loren Hampton
6 years ago
Views:

1 The Unbiasedness Approach to Linear Regression Models arxiv:5.0096v [stat.me] 3 Nov 205 P. Vellaisamy Department of Mathematics, Indian Institute of Technology Bombay, Powai, Mumbai , India. Abstract The linear regression models are widely used statistical techniques in numerous practical applications. The standard regression model requires several assumptions about the regressors and the error term. The regression parameters are estimated using the least-squares method. In this paper, we consider the regression model with arbitrary regressors and without the error term. An explicit expression for the regression parameters vector is obtained. The unbiasedness approach is used to estimate the regression parameters and its various properties are investigated. It is shown that the resulting unbiased estimator equals the least-squares estimator for the fixed design model. The analysis of residuals and the regression sum of squares can be carried out in a natural way. The unbiased estimator of the dispersion matrix of the unbiased estimator is also obtained. Applications to AR(p) model and numerical examples are also discussed. Keywords. Linear regression, regression coefficients, unbiased estimator, least-squares estimator, autoregressive model. Introduction The linear regression model is a commonly used statistical technique in practical applications (Quenonuille (950)), because of its simplicity and its realistic nature for modeling several phenomena. For an extensive treatment of this topic, one may refer to Draper and Smith (2003) and Chatterji and Hadi (2003). In the standard multiple linear regression model Y β X +...,β p X p +ǫ, it is generally assumed that the predictor variables X,...,X p are constant (non-stochastic) and also the error term ǫ is independent of predictor variables (see, for example, Yan and Su (2009)), especially for the estimation of regression coefficients. This means that the values of X (X,...,X p ) are controlled by an experimenter and when the experiment is repeated the values of X does not change, but Y changes as ǫ changes. These assumptions are not realistic in practice and in most experimental setups and especially in econometric situations where the changes in Y occurs mainly due to the changes in X. Even

2 2 under the stochastic regressors models, it is commonly assumed that X i s are independent of the ǫ. It is known that when one of the X i is correlated with ǫ i, the ordinary least-squares (OLS) estimator becomes biased and inconsistent as well. To overcome such difficulties, an instrumental variable (IV) Z, which is highly correlated with X i and is independent of ǫ i, is discussedintheliterature. But, findingasuitableivisalsoanissue. Also, OLSestimator has smaller variance with IV estimator, though it may have bias larger that that of IV estimator. In this paper, we consider the linear regression model with little assumptions about the predictors and treat them as random. We do not explicitly introduce the error term and thus avoid not only the associated probabilistic assumptions, but also the related issues mentioned above. Besides, we do not use the least-squares method for estimating the regression coefficients which requires vector differentiation and related minimization problems. In Section 2, we consider the linear regression model with stochastic regressors and obtain first an explicit relationship between the regression coefficients and the characteristics of the distribution of (Y,X,...,X p ). In Section 3, we obtain the unbiased estimators, using the relationship between regression coefficients. Our approach is semi-parametric in nature and does not assume any specific distribution, such as multivariate normal, which is commonly used in literature. Note that the unbiased estimator coincides with OLS estimator, as they are conditionally unbiased. Several known properties of the unbiased estimators are established using our approach. The dispersion matrix of the estimator is derived and also its unbiased estimator is also observed. The mean regression sum of squares and also the mean residual sum of squares are derived in a natural way. Using these results, a predictor of E(Y) based on a new observation is also pointed out. The method is then extended to autoregressive AR(p) model also. Finally, we discuss two examples, one based on simulated data and other one based on a real-life data, to show that that our approach yields the same least-squares estimates for the AR(3) model. 2 Multiple Linear Regression Model Let Y be the response or dependent variable, and X,X 2,...,X p be p arbitrary explanatory (predictor variables) which are not necessarily independent. Consider the class C of linear regression model defined by E(Y X,...,X p ) b 0 +b X +...+b p X p b 0 +X (p) b (p),(say), (2.) where X (p) (X,...,X p ) is the row-vector of predictor variables, and b (p) (b,...,b p ) t is the column-vector of regression coefficients. Note that model (2.) is rather general and

3 3 includes the class C ǫ of the usual linear regression models, defined by Y b 0 +b X +...+b p X p +ǫ, (2.2) where ǫ is the error term independent of the X i s. For example, let Z (X,X 2 ), E(Z) 0 and consider the model Y b X +b 2 X 2 +b 3 X X 2 Z. (2.3) Then E(Y X,X 2 ) b X + b 2 X 2 which belongs to C, but does not belong to C ǫ. Thus, C ǫ C. As another example, consider the autoregressive AR(p) process defined by X t φ 0 +φ X t +φ 2 X t φ p X t p +ǫ, (2.4) where the error term is independent of the X i, (t p) i t. Then E(X t X,...,X t ) φ 0 +φ X t +φ 2 X t φ p X t p which is again of the from given in (2.). Note here the X i are not independent. Thus, AR(p) process belongs to C ǫ and hence also to C. Our aim here is to consider the linear regression model defined in (2.) and the estimation of regression coefficients in a natural and simple way, with hardly a few assumptions on the model. For that purpose, we require the relationship among regression coefficients. 2. Relationships Among Regression Coefficients Let Y 0 (Y b 0 ) so that from (2.) E(Y 0 X (p) ) X (p) b (p) ; E(Y 0 ) E(X (p) )b (p), (2.5) where E(X (p) ) (EX,...,EX p ) is the row vector of the means of X i s. First our aim is to obtain an explicit representation for the regression coefficients. From (2.5), we get E(Y 0 X X (p) ) X (p) X b (p) ; E(Y 0 X ) E(X (p) X )b (p). (2.6) Hence, from (2.5) and (2.6) Cov(Y 0,X ) E(X (p) X )b (p) E(X (p) )E(X )b (p) Cov(X (p),x )b (p), (2.7) where Cov(X (p),x ) ( Cov(X,X ),Cov(X 2,X ),...,Cov(X (p),x ) ) is the row-vector of covariances. Similarly, Cov(Y 0,X i ) Cov(X (p),x i )b (p), 2 i p.

4 4 Since Cov(Y 0,X i ) Cov(Y,X i ), we can represent the above observations in a matrix form as, Cov(Y,X ) C,C 2,...,C p b Cov(Y,X 2 ) C 2,C 22,...,C p2 b 2, (2.8)... Cov(Y,X p ) C p,c 2p,...,C pp where C ij Cov(X i,x j ), for i,j p. Let now Cyx t (C y,c y2,...,c yp ) (Cov(Y,X ),...,Cov(Y,X p ); C xx (C ij ). (2.9) Then equation (2.8) can be written compactly as C yx C xx b (p), (2.0) which gives the relationship between regression coefficients and the associated covariances. Also from (2.5), we obtain b 0 E(Y) E(X (p) )b (p). (2.) Thus, we have proved the following result. Theorem 2. Let E(Y X,...X p ) b 0 + X (p) b (p) be the multiple linear regression model defined as in (2.). Let Cyx t (C y,c y2,...,c yp ), with C yj Cov(Y,X j ), and C xx (C ij ) with C ij Cov(X i,x j ), i,j p. Then the regression coefficients b 0 and b (p) satisfy C yx C xx b (p), and (2.2) b p where µ y E(Y) and µ (p) (EX,,EX p ) (µ,...,µ p ). When C xx is non-singular, we have a unique representation as b 0 µ y µ (p) b (p), (2.3) b (p) C xx C yx. (2.4) If C xx is singular, a g-inverse C xx may be used to obtain a representation for b (p). That is, using Jordan decomposition, let C xx PDP t, where P is an orthogonal matrix, and D diag[λ,...,λ p ] is a diagonal matrix. Then a g-inverse of C xx is C xx PDP, where D diag[λ,...,λ p ], and λ i λ i if λ i > 0, and is zero if λ i 0.

5 5 Remarks 2. Observe that when p and E(Y X ) b 0 +b X, the regression coefficient b is given by a known result. b C y C Cov(Y,X ) Var(X ), Similarly, from (2.4), the regression coefficients for the case p 2 are given by b C 22C y C 2 C y2 C C 22 C2 2 ; b 2 C C y2 C 2 C y C C 22 C2 2 (2.5) and the intercept b 0 can be evaluated using (2.3) and (2.5). Remarks 2.2 The following properties of the model in (2.) follow now easily. (i) It is known that f(x) E(Y X) minimizes E(Y f(x)) 2. This shows that E(Y X) X (p) b (p) minimizes the mean squared error whenx (p) b (p) is treated as an approximation to Y. (ii) Also, when C xx > 0 (positive definite) Cov(Y X (p) b (p),x (p) ) Cov(Y,X (p) ) Cov(X (p) b (p),x (p) ) C t yx bt (p) Cov(X (p),x (p) ) C t yx bt (p) C xx C t yx C t yxc xxc xx 0. (2.6) Indeed, we have from (2.6), Cov(Y X (p) b (p),x j ) 0, i j p Cov(Y X (p) b (p),x (p) d (q) ) 0 (2.7) for any vector d (q) (d,...,d q ). In particular, we have Cov(Y X (p) b (p),x (p) b (p) ) 0. (2.8) Also, Var(Y) Var(Y X (p) b (p) +X (p) b (p) ) Var(Y X (p) b (p) )+Var(X (p) b (p) ), because of (2.8).

6 6 3 Estimation of Regression Coefficients We start with a simpleknown result. Let (Y,X) beabivariate randomvector with with finite second moments and σ yx Cov(Y,X). Suppose (Y,X ),...,(Y n,x n ) is a random sample on the bivariate vector (Y,X). Then it is well known that is an unbiased estimator of σ yx. S yx n (Y i Y)(X i X i ) (3.) i Consider now the estimation of regression coefficients b 0 and b (p). Supposewe have a random sample (Y j,x j,...,x pj ), i n, of size n on (Y,X,X 2...,X p ). Let Y Y n. Y Y j ; X. X (p). X (p)j. X (p)n X X 2...X i...x p. X j X 2j...,X ij...x pj, (3.2). X n X 2n...X in,...x pn where X (p)j (X j,x 2j,...,X pj ) denotes the row-vector of j-th observation on predictor variables and X ij denotes the j-th observation on X i, i p, j n. Then, from (2.), E(Y j X)) E(Y j X (p)j ) b 0 +X (p)j b (p), j n. (3.3) Let Y n j Y j/n, and X i n j X ij/n, i p be the sample means. Also, let S yi n (Y j Y)(X ij X i ) n j (X ij X i )Y j (3.4) j be the sample covariance between Y and X i and S lm n (X lj X l )(X mj X m ) (3.5) j be the sample covariances between X l and X m, l,m p. Then by (3.), we have E(S yi ) C yi ; E(S lm ) C lm, (3.6) where C yi Cov(Y,X i ) and C lm Cov(X l,x m ), l,m n. Let S t yx (S y,s y2,...,s yp ) be the row-vector and S xx (S ij ) be the (p p) matrix of sample covariances of explanatory variables X,X 2,...,X p. Then from (3.6), E(S yx ) C yx ; E(S xx ) C xx, (3.7)

7 7 where C xx (C ij ) defined in (2.8). Using Theorem 2., an estimator ˆb (p) of the regression coefficient vector b (p) can be obtained by solving S xxˆb(p) S yx, (3.8) which is obtained by replacing the unbiased estimator of C yx and C xx in (2.2). When S xx is nonsingular, a.e, an explicit and unique estimator of b (p) is Similarly, an estimator of b 0 can be obtained from (2.3) as, where X (p) (X,X 2,...,X p ). ˆb(p) S xx S yx. (3.9) ˆb0 ˆµ y µ (p) b(p) Y X (p) b(p), (3.0) For completeness, we next summarize the results discussed above. Theorem 3. Let E(Y X (p) ) b 0 +X (p) b (p) and (Y j,x j,...,x pj ), j n, be a random sample of size n on (Y,X,...,X p ). Let S yx (S y,s y2,...,s yp ) t be the column vector of sample covariances between Y and X i s, and S xx (S ij ) be the sample covariance matrix of X i s. Let S xx be non-singular a.e. Then the natural estimators of b (p) and b 0 are ˆb(p) S xx S yx, and (3.) b0 Y X (p) b(p), (3.2) where Y and X (p) (X,...,X p ) denote the sample means. Remark 3. (i) When p and b b, we have b S yx /S xx, and b 0 Y X b, which coincide with the classical least-squares estimators for the simple linear regression model. (ii) From (3.8), b p (p) satisfies in general S xx b(p) S yx. It can be shown that S yx Cyx, as p p n, and S xx Cxx, as a special case. Thus, by Slutsky s theorem, b (p) b(p), a solution of the defining equation (2.2). Hence, b (p) is a consistent estimator for b (p). (iii) Note the natural estimators b (p) and b 0 are obtained by substituting the unbiased estimator of C yx,c xx and µ y and µ (p) in the defining equations (2.2) and (2.3). The next result shows that the natural estimators, given in Theorem 3., are indeed unbiased. Let henceforth X X (n,p) (X(n) t,...,xt p(n) ) (X ij) be the matrix of observations (see (3.2)), where X i(n) (X i,...,x in ) denotes the row-vector of n observations on X i. Theorem 3.2 Let S xx be non-singular a.e. Then the estimators b (p) and b 0 are both conditionally unbiased and hence are unbiased.

8 8 Proof. First note that, for i p, E(S yi X) E ( (X ij X i )Y j X ) (n ) (n ) (n ) j (X ij X i )E(Y j X) j ( (X ij X i ) b 0 + j ) p X rj b r r (from (3.3)) ( p ) (X ij X i ) X rj b r (n ) j r p (X ij X i )X rj b r (interchanging the order) (n ) r p (S ir )b r r S i(p) b (p), where S i(p) (S i,...,s ip ). Hence, j E(S yx X) S xx b (p) E(S xxs yx X) E(ˆb (p) X) b (p), (3.3) showing that ˆb (p) is conditionally unbiased. Similarly, E( b 0 X) E(Y X) X (p) E(ˆb (p) X) (b 0 +b X +...+b p X p X) X (p) b (p) (using (3.3)) b 0 +X (p) b (p) X (p) b (p) and so b 0 is also conditionally unbiased. b 0, (3.4) Hence, it follows, from (3.3) and (3.4), that E(ˆb (p) ) b (p) ; E(ˆb 0 ) b 0, proving the result. Remark 3.2 (i) Let now S xx 0 a.e. (positive semidefinite) and p >. Then, the estimator b(p) satisfies S xx b(p) S yx, for all (Y,X) E(S xx b(p) X) E(S yx X), for all X ( ) S xx E( b (p) X) b (p) 0, for all X,

9 9 using (3.3). Since S xx 0 a.e, we have E( b (p) X) b (p) a.e and so E( b (p) ) b (p), showing that b (p) is unbiased in this case also. It is known that if E( θ i ) θ i,i,2, then E( θ 2 / θ ) (θ 2 /θ ) may not hold in general. However, it holds in the linear regression case, in view of the following general result on a property of unbiased estimators. Let L(X) denote the distribution of X. Lemma 3. Let θ be a characteristic of L(X), θ 2 be a characteristic of L(Y,X) and θ (θ 2 /θ ). Let θ θ (X) and θ 2 θ 2 (Y,X) be respectively estimators of θ and θ 2 such that θ > 0 a.e and E( θ 2 X) θ θ a.e. Then θ ( θ 2 / θ ) is also unbiased for θ. Proof. Since E(ˆθ) E(E(ˆθ X)) E ( E )) ( ) ( ) ( θ2 X E E( θ 2 X) E θ θ θ, (3.5) θ θ θ the result follows. Remark 3.3 (i) Note in Lemma 3., θ is not necessarily unbiased. If in addition it is unbiased, then we have E ) ( θ2 θ θ 2 θ E( θ 2 ) E( θ ), an interesting result. This is indeed the case for the regression coefficient b, as seen next. (ii) Let p and n 2 so that E(Y X) b 0 +b X. Then, as seen earlier, b C yx C xx θ 2 θ (say); ˆb S yx S xx, where E(S yx ) C yx,e(s xx ) C xx. Also, by Theorem 3.2 E(ˆb ) E( S yx S xx ) C yx C xx b. The above result holds because the conditions S xx > 0 a.e. and from (3.3) of Lemma 3.2 are satisfied. E( θ 2 X) E(S yx X) S xx b θ b Let b t (b 0,b (p) ) denote the row-vector of regression coefficients and n denote the n- dimensional column vector with all its entries equal to. The least-squares estimator of b, when X is fixed (called fixed design) is bl (X t X ) X t Y, (3.6) where X ( n.x), Y t (Y,...,Y n ) and X is defined in (3.2). We now have following result.

10 0 Theorem 3.3 Let p and S xx > 0a.e. Then the unbiased estimator b t u ( b 0, b (p) ) t, defined in Theorem 3., coincides with the least squares estimator b l, defined in (3.6), for the fixed design. Proof. Note first that ( ) ( ) X t t X n n t nx n nx(p) X t n X t X X t X nx t (p) Using the formula (see Seber(984), p. 59) for the inverse of a partitioned matrix, namely, ( ) ( ) A B A +FG F t FG, B t D G F t G where F A B and G D B t F, we obtain (X t X ) n + (n ) X (p)sxxx t (p) (n ) S xxx t (p) (n ) X (p)sxx (n ) S xx (3.7) Note for example that the (i,j)- th element of the matrix G (X t X) nx t (p)x (p) is G ij X ir X jr nx i X j (n )S ir. r Hence, G (n )S xx. The other elements of the matrix in the rhs of (3.7) can easily be computed. Therefore, (X t X ) X t Y n + (n ) X (p)sxx Xt (p) (n ) S xx Xt (p) (n ) X (p)sxx (n ) S xx Y + (n ) X (p)sxx(nx t (p)y X t Y) (n ) S xx Xt Y n (n ) S xx Xt (p) Y ( Y X(p) Sxx S ) yx, SxxS yx using the fact (X t Y nx (p) Y) (n )S yx. The result now follows. We next find the covariance matrix D( b u ) Cov( b u, b u ) of b u ( b 0, b (p) ) t. ( ) n Y X t Y Theorem 3.4 Assume, in the model (2.), Var(Y X (p) ) σy x 2, which does not depend on X (p), and Sxx exists. Then the covariance matrix of b u is D( b u ) σy x 2 E(Xt X ) σy x 2 E n + (n ) X (p)sxx Xt (p) (n ) S xx Xt (p) (n ) X (p)sxx (n ) S xx (3.8)

11 Proof. Since E( b u X) b, we have D( b u ) E(D( b u X)) ( ) Var( b0 X) Cov( b 0, b (p) X) E (Cov( b 0, b (p) X)) t D( b (p) X) (3.9) First we obtain Observe now, Cov(S yr,s ys X) Cov D( b (p) X) D(S xxs yx X) (n ) S xx D(S yx X)S xx. (3.20) (X rj X r )Y j, (n ) j (X sk X s )Y k X (n ) 2 (X rj X r )(X sk X s )Cov((Y j,y k ) X). (3.2) j,k k Since Var(Y X (p) ) σ 2 y x, we have Cov((Y j,y k ) X) Hence, for r,s {,2,...,p}, { σ 2 y x, if k j 0, otherwise. and Cov((S yr,s ys ) X) Hence, we obtain from (3.20) σy x 2 (n ) 2 j (X rj X r )(X sj X s ) σ2 y x (n ) S rs D(S yx X) σ2 y x (n ) S xx. (3.22) D( b (p) X) σ2 y x (n ) S xx. (3.23) We obtain next Var( b 0 X). Note Var( b 0 X) Var(Y X (p) b(p) X) Var(Y X)+Var(X (p) b(p) X) 2Cov(Y,X (p) b(p) X). (3.24) By assumption, Var(Y X) σ2 y x n (3.25)

12 2 and using (3.23) Var(X (p) b(p) X) XD( b (p) X)X t (p) σ2 y x (n ) X (p)s xxx t (p). (3.26) We next show that Cov(Y,X (p) b(p) X) 0. Since we can write S yk S yx (n ) (n ) (X kj X k )Y j, j (X (p)j X (p) ) t Y j, where X (p)j (X j,x 2j,...,X pj ), and X (p) (X,X 2,...,X p ), as before. As Y j s are iid, we have Cov(Y,X (p) b(p) X) Cov( n j j Y j, (n ) X (p)sxx (X (p)j X (p) ) t Y j X) j Cov( n Y j, (n ) X (p)sxx(x (p)j X (p) ) t X) j n(n ) X (p)s xx (X (p)j X (p) ) t Cov(Y j,y j X) σy x 2 n(n ) X (p)sxx (X (p)j X (p) ) t j 0. (3.27) Therefore, from (3.24) (3.26), we have ( ) Var( b 0 X) σy x 2 n + (n ) X (p)sxxx t (p). (3.28) Finally, we compute Cov( b 0, b (p) X) ( ) Cov( b 0,e t b (p) X),...,Cov( b 0,e t p b p X), (3.29) where e t j is the ( p) row vector whose j-th entry is and all other entries are zeros. Note, for r p, Cov( b 0,e t r b (p) X) Cov(Y,e t r b (p) X) Cov(X (p) b(p),e t r b (p) X). (3.30) It can be shown, as in (3.27), that Cov(Y,e t r b (p) X) 0. (3.3)

13 3 Also, for r p, Hence, Cov(X (p) b(p),e t j b (p) X) X (p) D( b (p) X)e r σ2 y x (n ) X (p)s xxe r. (3.32) Cov( b 0, b (p) X) σ2 y x (n ) X (p)s xx (e,e 2,...,e p ) σ2 y x (n ) X (p)s xx. (3.33) Substituting (3.28),(3.33) and (3.23) in (3.9) and using (3.7), the result follows. 3. Analysis of Residuals and Regression Sum of Squares Consider the model (3.3) defined by E(Y j X (p)j ) b 0 +X (p)j b (p), Var(Y j X (p)j ) σy x 2, j n. (3.34) Assume n > (p+), and let b t (b 0,b (p) ), as before. Then the above model can be written as E(Y X ) X b; D(Y X ) σ 2 y x I n, (3.35) where Y t (Y,...,Y n ), X ( n.x) and I n is the identity matrix of order n. Then, in view of Theorem 3.3, bu (X t X ) X t Y. Let e (Y Ŷ), where Ŷ X b u, denote the residual vector. Then e [ I n X (X t X ) X t ] Y MY (say), where M M(X) [ I n X (X t X ) X t ] is symmetric and idempotent. It is also well known (see Draper and Smith (2002)) that tr(m) (n p ) and MX 0 a.s. Note also that e t e Y t MY and hence the the mean residual sum of squares is MSS E E(e t e) E { E(Y t MY X ) } E { (E(Y X )) t M(E(Y X ))+tr(md(y X )) } { } E b t X t MX b+tr(m)σy x 2 I n σ 2 y x tr(m) Thus, an unbiased estimator of σ 2 y x is (n p )σ 2 y x. (3.36) σ 2 y x SS E (n p ) e t e (n p ). (3.37)

14 4 Indeed, we have (see (3.36)) shown that so that it is also conditionally unbiased. E( σ 2 y x X ) σ 2 y x (3.38) Theorem 3.5 Let D( b u ), given in (3.8), be the dispersion matrix unbiased estimator b u. The its unbiased estimator is given by D( b u ) σ 2 y x(x t X ), (3.39) where σ 2 y x is defined in (3.37). Proof. Using (3.38), we get E( D( b u )) E(E( σ 2 y x(x t X ) X )) E(E( σ 2 y x X )(X t X ) ) E(σy x 2 (Xt X ) ) σy x 2 E((Xt X ) ) D( b u ) and hence the result follows. 3.2 Mean Regression Sum of Squares Note first that X t e Xt MY (MX ) t Y 0 Ŷt e 0 a.s. In particular, we have t ne 0 Y Ŷa.s. In addition, Cov( b u,e X ) Cov((X t X ) X t Y,Y X ) Cov( b u,x b u X ) (X t X ) X t D(Y X ) D( b u X )X t σy x 2 (Xt X ) X t σy x 2 (Xt X ) X t 0. (3.40) Hence, Cov( b u,e) E ( ) ( Cov( b u,e X +Cov E( bu X ),E(e X ) ) 0, using (3.40) and E(e X ) X b X E( b u X ) 0 (see (3.3) and (3.6)). Note (3.40) also implies Cov(Ŷ,e X ) 0. Also, Y t Y (e+ŷ)t(e+ŷ) Ŷt Ŷ +e t e,

15 5 which implies E(Y t Y nȳ 2 ) E(Ŷt Ŷ nȳ 2 )+E(e t e). (3.4) That is, MSS T MSS R +MSS E, where MSS T and MSS R respectively denote the mean total sum of squares and mean regression sum of squares. The mean regression sum of squares MSS R is given by { } E(Ŷt Ŷ) E E( b t u X t X bu X ) { } E (E( b u X )) t X t X E( b u X )+tr(x t X )D( b u X ) { } E b t X t X b+σy x 2 tr(i p+) { } E b t X t X b+(p+)σy x 2. (3.42) Thus, we obtain E (Ŷt ) Ŷ p+ (p+) E(bt (X t X )b)+σ 2 y x. (3.43) Form (3.4), (3.36) and (3.42), we obtain the mean total sum of squares as leading to E(Y t Y) E(b t (X t X )b)+nσ 2 y x E( Yt Y n ) n E(bt (X t X )b)+σy x 2. Finally, we define the mean coefficient of determination as ( n R 2 m E(R0) 2 E (Ŷi Y) 2 n ), (Y i Y) 2 so that R 2 0 is an unbiased estimator of R2 m. Note R m denote the mean multiple correlation coefficient. Finally, consider the problem of prediction of E(Y 0 ) or that of Y 0, a future observation corresponding to X (p)0, the new observed vector of predictors can easily be considered. That is, Ŷ0 X (p)0 b(p) may be considered as a predictor of E(Y 0 ) or that of Y 0. In that case, we have Var(Ŷ0) X (p)0 D( b (p) )X t (p)0 σ2 y x (n ) X (p)0e(s xx)x t (p)0. Also, an estimated value of Var(Ŷ0) is Var(Ŷ0) σ2 y x (n ) X (p)0s xxx t (p)0, which could be used to obtain prediction intervals for Y 0 or that of E(Y 0 ).

16 6 4 Application to AR(p) Process In this section, we show that the unbiasedness approach works for AR(p) time series models also. Consider AR(p) process so that Y t φ 0 +φ Y t +...+φ p Y t p +ε t, (4.) where ε t s are i.i.d with E(ε t ) 0 and V(ε t ) σ 2. Note AR(p) process satisfies E(Y t Y t,...,y t p ) φ 0 +φ Y t +...+φ p Y t p, (4.2) which is of the form (2.). Let E(Y t ) η, and from model (4.2), Cov(Y t,y t ) γ() Cov(Y t,y t 2 ) γ(2) C yx,.. Cov(Y t,y t p ) γ(p) and C xx (Cov(Y t i,y t j )) (γ (j i) ), for i j p. Then from Theorem 2., we have γ(0)... γ(p ) φ γ() γ()... γ(p 2) φ 2 γ(2)... γ(p )... γ(0) γ(p) φ p (4.3) or equivalently dividing by γ(0) Var(Y t ), ρ()... ρ(p ) ρ()... ρ(p 2). ρ(p ) ρ(p 2)... and φ φ 2 φ 0 η (η,...,η) η(. φ p φ ρ() φ 2 ρ(2).. ρ(p) φ p (4.4) p φ j ). (4.5) j

17 7 Note also that (4.4) is nothing but Yule-Walker equations. Assume we have a sample (Y,Y 2,...,Y n ), n p+, that satisfies (4.). Then, Y p+ φ o +φ Y p +...+φ p Y +ε p+ Y p+2 φ o +φ p+ Y p φ p Y 2 +ε p+2. (4.6) Y n φ o +φ Y n +...+φ p Y n p +ε n. Alternatively, the equations in (4.2) can also be written as, for j n p, E(Y p+j Y p+j,...,yj ) φ o +φ Y p+j +...+φ p Y j. (4.7) Let now Y Y p+ Y p+2. Y n φ φ 2, φ. φ p, Y p Y p... Y Y p+ Y p... Y 2 ; X. Y n Y n 2... Y n p Let us introduce the following notations. Define for 0 i p, Y p i+ n p Y p i+k, (n p) k the mean of values in i-th column of X. Also, let for i,j p, S i,j n p (Y p i+k Y p i+ )(Y p j+k Y p j+ ) (n p ) be the sample covariance between the i-th and j-th columns of X. Similarly, for j p, let S p+,j k n p (Y p+l Y p+ )(Y p j+l Y p j+ ) (n p ) be the sample covariance between Y and j-th column of X. Then S p+, S, S,2... S,p S p+,2 S 2, S 2,2... S 2,p S y,x ; S. xx. S p+,p S p, S p,2... S p,p Using our formulas in Theorem 3. for estimation of φ, we get, l. S xx φ Syx. (4.8)

18 8 That is, ˆφ j, j p, satisfies S, S,2... S,p S 2, S 2,2... S 2,p. S p, S p,2... S p,p ˆφ ˆφ 2. ˆφ p S p+, S p+,2. S p+,p (4.9) Let now Y (p) (Y p,y p,...,y ), so that we obtain from (4.5), ˆφ ˆφ 0 Y p+ Y ˆφ 2 (p) Y. p+ ˆφ p, Then the fitted AR(p) model is p (Y j ˆφp+ j ). (4.0) Y t ˆφ 0 + ˆφ Y t + + ˆφ p Y t p, (4.) where ˆφ j,0 j p, satisfy (4.9) and (4.0). Remark 4. Using our method, we have given a rigorous proof for the unbiased estimators of the parameters of the AR(p) model. This is another advantage of our approach. j 4. Numerical examples Finally, we discuss in this section two numerical examples, one based on simulated data and the other based on real-life data, for estimating the parameters of the AR(3) model. We show our results yield the estimates that coincide with classical estimates. Example 4. We simulated n00 data from AR(3) model with the parameters (0.4, 0., 0.3), using the R package. Based on the simulated data, we fitted the AR(3) model and estimated regression coefficients, including the intercept term, using our results. The results are given below: Parameters Least-squares Yule-Walker Unbiased estimate estimate estimate φ φ φ φ

19 9 Example 4.2 The data in feet on Level of lake Huron data (Brockwell and Davis (2006), p. 555)) is fitted for AR(3) model. The results are given below. Parameters Least-squares Yule-Walker Unbiased estimate estimate estimate φ φ φ φ From both the examples above for the AR(3) model, we see that the unbiased estimates obtained using our results coincide with the least-squares and the Yule-Walker estimates, as expected. Acknowledgments. The author is thankful to Saumya Bansal, Neha Rani Gupta and Swati Verma for some useful discussions and the computational work. References Brockwell, P. J. and Davis, R. A. (2006). Time Series: Theory and Methods. Second Edition, Springer, New York. Chatterji, S. and Hadi, A. S. (2006). Regression Analysis by Examples. John Wiley & Sons, New Jersey. Draper, N and Smith, H. (2003). Applied Regression Analysis. Third Edition, Wiley, New York. Quenonuille, M. H. (950). An application of least squares to family diet surveys. Econometrica, 8, Yan, X. and Su, X. G. (2009). Linear Regression Analysis: Theory and Computing. World Scientific Publishing Co, Pte Ltd, Singapore.

STAT 100C: Linear models

STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix