Material : solution Class : Teacher(s) : zacharias psaradakis, marian vavra Example 1.1: Consider the linear regression model y Xβ + u, (1) where y is a (n 1) vector of observations on the dependent variable, X is a (n k) matrix of nonstochastic explanatory variables such that rk) k < n, β is a (k 1) vector of unknown parameters, and u is a (n 1) vector of unobserved disturbances with E(u) 0 and E(uu ) σ I n. 1. Prove that the OLS estimator ˆβ X) 1 X y is the Best Linear Unbiased Estimator BLUE] of β. Explain what role each of the assumptions about X and u plays in your proof.. Show that c ˆβ is the BLUE of c β, where c is a (k 1) vector of constants. Solution 1.1: Recall that an estimator is the BLUE if it has a minimum variance among all other linear estimators (sometimes also called the Gauss Markov Theorem). The proof of the BLUE consists of two parts: unbiasedness and minimum variance. An estimator is said to be unbiased if E(ˆβ) β E(ˆβ) E X) 1 X y] E X) 1 X β + u)] β + E X) 1 X u] β + X) 1 X E(u) β. () As you can see from above, the proof of unbiasedness is based on the assumption that E(u) 0 and X is nonstochastic matrix (this ensures that X and u are always independent!). In the case of the stochastic matrix X, we have to impose extra restriction on error vector u to be strictly independence of X. An estimator is called a minimum variance estimator if all other linear estimators have higher 1
variance of estimated parameters. The variance of the OLS estimator ˆβ is as follows V(ˆβ) E(ˆβ β)(ˆβ β) ] E X) 1 X uu X X) 1 ] X) 1 X E(uu )X X) 1 σ X) 1 X X X) 1 σ X) 1, (3) where we apply the fact that E(uu ) σ I n and that rk) k < n, which implies that X) 1 exists. To proof the second property of BLUE, that means the minimum variance of the OLS estimator, we have to define another linear estimator and check its variance. Let us assume that a new estimator takes the form β Cy CXβ + Cu. Recall, just for completeness, that the OLS estimator can be rewritten into the form ˆβ X) 1 X y Ay, where A X) 1 X. However, It is very important to point out that the new matrix C is some other matrix with different structure than the matrix A! We only require that CX I. We need this assumption for ensuring that the new estimator β is also unbiased. Moreover, let us determine a new matrix D such that C D + A D + X) 1 X. So, the variance of β is as follows V( β) E( β β)( β β) ] E(D + A)uu X(D + A) ] (D + A)E(uu )(D + A) σ (D + A)(D + A) σ (DD + AD + DA + AA ) σ (DD + X) 1 ) V(ˆβ) + σ DD, (4) where DD is positive definite definite matrix. So, it holds that V( β i ) > V( ˆβ i ) for all i 1,..., k. Therefore, the OLS estimator ˆβ is said to be BLUE. Note: The derivation is based on the fact that CX I, which can be rewritten as (D + A)X DX + AX I, which implies that DX 0, therefore AD DA 0 and these terms disappear from equation (4). As for the second question, let assume that c is a (k 1) vector of constants. Then, it holds that E(c ˆβ) c E(ˆβ) c β, (5) which means that c ˆβ is also unbiased. Before checking minimum variance of c ˆβ, we have to find a variance of this modified estimator V(c ˆβ) Ec (ˆβ β)(ˆβ β) c] c E(ˆβ β)(ˆβ β) ]c c V(ˆβ)c. (6)
The variance of c β estimator can be as follows V(c β) c V( β)c c V(ˆβ) + σ DD ]c V(c ˆβ) + σ c DD c. (7) It means that also the modified OLS estimator is BLUE, since if DD is positive definite, then c DD c > 0 for all c. Note: Make sure you know exactly all properties about a positive (semi)definite matrix. Moreover, you should be able to understand what is meant by the variance-covariance matrix V(ˆβ): what are diagonal and off-diagonal elements! Example 1.: For regression model y t β 1 x 1t + β x t + β 3 x 3t + u t, for t 1,..., n, (8) where a sample of n 33 observations yields x 1t x1t x t 1 x1t y t 5 x t 10 xt x 3t 0 xt y t 10 x 3t 1 x1t x 3t 1 x3t y t 4 y t 35 where all summations are over t form 1 to 33. 1. Compute the OLS estimates ˆβ and ˆσ V(û t ).. Obtain an estimate of the variance-covariance matrix of ˆβ. How would you obtain the estimated standard errors for each element of ˆβ? 3. Test the hypothesis H 0 : β 0 versus H 1 : β > 0. 4. Test the hypothesis H 0 : β 1 + β versus H 1 : β 1 + β. Solution 1.: From lectures you certainly know that the regression model can be written in the matrix form as follows y t β 1 x 1t + β x t + β 3 x 3t + u t, (9) y Xβ + u, (10) where y is a (33 1) vector of observations on the dependent variable, X is a (33 3) matrix of nonstochastic explanatory variables such that rk) 3 < 33, β is a (3 1) vector of unknown parameters, and u is a (33 1) vector of unobserved but disturbances with E(u) 0 and E(uu ) σ I n. 3
1. So, the OLS estimate of β (β 1, β, β 3 ) can be calculated directly from very well known formula ˆβ X) 1 X y, where x 11... x 1n x 11 x 1 x 31 x X) x 1... x n 1t x1t x t x1t x 3t 1 1... x1t x t x t xt x 3t 1 10 0. x 31... x 3n x 1n x n x x1t 3n x 3t xt x 3t x 3t 1 0 1 (11) We also know that X X 9. So, the inverse of the matrix X X is as follows X) 1 1 10 1 10 1 1 1. (1) 9 10 1 19 For the OLS estimate we also need the following result x 11... x 1n y 11 X y x 1... x n x1t y 1t 5. xt y t 10. (13) x 31... x 3n x3t y 3t 4 y n1 Therefore, the OLS estimate can be calculated as follows ˆβ X) 1 X y 1 10 1 10 5 0 1 1 1 10 1. (14) 9 10 1 19 4 4 For an estimation of variance of u t we use an unbiased estimator of σ, which is defined as ˆσ û û/n k, where n denotes a sample size and k number of estimated parameters in the regression. In our case, the estimated variance is as ˆσ 1 û t 30 t 1 (y t x t 4x 3t ) 30 t 1 (yt y t x t 8y t x 3t + x t + 8x t x 3t + 16x 30 3t) t 1 30 (35 0 3 + 10 + 16) 9 30. 0.3. (15). Estimated standard errors of estimated parameters ˆβ 1, ˆβ, and ˆβ 3 can be obtain as a diagonal elements of the variance-covariance matrix ˆV(ˆβ), which is given by ˆV(ˆβ) E(ˆβ β)(ˆβ β) ] ˆσ X) 1 9 10 1 10 0.33 0.03 0.33 1 1 1 1 0.03 0.03 0.03. (16) 30 9 10 1 19 0.33 0.03 0.63 4
So, the estimated standard deviations of parameters are Ŝ( ˆβ 1 ) ˆV( ˆβ 1 ) 0.33 0.57, Ŝ( ˆβ ) ˆV( ˆβ ) 0.03 0.17, (17) Ŝ( ˆβ 3 ) ˆV( ˆβ 3 ) 0.63 0.79. 3. For applying an exact test, the t-test in our case, we have to impose some restriction on a distribution of error terms. In the context of OLS, we usually assume that u N(0, σ I n ). If we do not impose any restriction, tests are only asymptotically valid, but NOT exact. But even when a distribution is specified, we can have some difficulties to find an appropriate exact test - you will see this in upcoming weeks. So, let us assume that u N(0, σ I n ). In our case, we should test hypothesis H 0 : β 0 against some alternative hypothesis H 1 : β > 0. Two things have to be still specified. First, the significance level α denoting the probability of type I error (we usually use α {0.1, 0.05, 0.01}). Second, which test we can apply and why. The different test can lead to the different result! In our case, the hypothesis is a single parameter hypothesis, which leads us to the standard t-test defined as t ˆβ 0 Ŝ( ˆβ ) 1 0.17. 5.9. (18) A critical value of the t-distribution for α 0.05 is 1.7, which is smaller than t 5.9, therefore we do REJECT the null hypothesis! It means that the estimated parameter ˆβ is not zero at a given significance level. Note: Make sure you understand the difference between large sample and small sample properties of beta ˆ for the hypothesis testing purpose. Moreover, you have to understand the logic of t-test! 4. In the second case, the null is specified as H 0 : β 1 + β, against the alternative hypothesis H 1 : β 1 + β. So, we have a multiple parameter hypothesis, which is usually tested by Wald test defined as W (cˆβ c β) c X) 1 c] 1 (c ˆβ c β) σ χ(p), (19) where p rk(c). The problem of this test is that is based on unknown quantity σ, but we have only its unbiased estimate ˆσ. Therefore, we have to rewrite the W -test into the form of F -test given by F (cˆβ c β) c X) 1 c] 1 (c ˆβ c β) pˆσ F (p, n k). (0) 5
In our case, the F -test (statistic) takes the form F (1 ) c X) 1 c] 1 (1 ) 0.3 1 0.3. 3.3. (1) A critical value of F distribution for a given α 0.05 is 4.. A value of the test (3.3) is smaller than a critical value (4.), therefore we CANNOT REJECT the null hypothesis at a given significance level. Note: Make sure you understand that different level of significance level α can lead to different results! It is worth noting that F -test can be also expressed using the sum of squared errors of unrestricted model and restricted model, or using a vector of restrictions c as in our case. 6
Example 1.3: Consider the regression model from question 1, where X X 1 partitioned into the k 1 and k k 1 columns. : X ] is 1. Show that X) 1 ] 1M X 1 ) 1 1X 1 ) 1 X 1X M 1 X ) 1 X ) 1 X X 1 M 1 X ) 1 M 1 X ) 1 ] 1M X 1 ) 1 1M X 1 ) 1 X 1X X ) 1 M 1 X ) 1 X X 1 1X 1 ) 1 M 1 X ) 1 (), where M i I n X i i X i) 1 X i, for i 1,.. Show that ˆβ ˆβ1 ˆβ 1 M X 1 ) 1 X 1M y M 1 X ) 1 X. (3) M 1 y 3. Show that V(β 1 ) σ 1X 1 ) 1 + 1X 1 ) 1 X 1X M 1 X )X X 1 1X 1 ) 1 ]. Solution 1.3: Let us start with a partition of the matrix X X, which is given by X X1 X) 1 ] X X 1 X 1 X 1X X X 1 X. (4) X X Check the dimension of the whole matrix X X and each element of this matrix. 1. The inverse matrix of X) is given by the following identity provided that X X is non-singular matrix X) X) 1 I, which can be, for the purpose of a solution, rewritten into the form as follows X 1 X 1 X 1X A11 A 1 I 0 X X 1 X, (5) X A 1 A 0 I which leads to a system of 4 equations with 4 unknown matrices From equation (8) results that 1X 1 )A 11 + 1X )A 1 I; (6) 1X 1 )A 1 + 1X )A 1 0; (7) X 1 )A 11 + X )A 1 0; (8) X 1 )A 1 + X )A I. (9) A 1 X ) 1 X 1 )A 11, (30) 7
which we plug into the equation (6), which gives a closed-form solution for A 11 1X 1 )A 11 + 1X )A 1 I 1X 1 )A 11 1X ) X ) 1 X 1 )A 11 I { } X 1 I X X ) 1 X X1 A11 I { } X 1 I X X ) 1 X X1 A11 I A 11 1M X 1 ) 1, where M I X X ) 1 X. Then we insert a solution for A 11 back into equation (30), which gives a closed-form solution for A 1 A 1 X ) 1 X 1 ) 1M X 1 ) 1. (31) Using the same procedure, we can get closed-form solutions for the remaining matrices in the form A M 1 X ) 1, A 1 1X 1 ) 1 1X ) M 1 X ) 1, where M 1 I X 1 1X 1 ) 1 X 1. Since we know that X) is a symmetric matrix, then also X) 1 must be a symmetric matrix: X) 1 ] X) ] 1 X) 1. So, in the case of partitioned matrix holds that A 1 A 1 X) 1 A11 A 1 A 1 A ] 1M X 1 ) 1 1X 1 ) 1 X 1X M 1 X ) 1 X ) 1 X X 1 M 1 X ) 1 M 1 X ) 1 ] 1M X 1 ) 1 1M X 1 ) 1 X 1X X ) 1 M 1 X ) 1 X X 1 1X 1 ) 1 M 1 X ) 1 (3), where M i I X i i X i) 1 X i, for i 1,.. We know that ˆβ X) 1X y, which can be rewritten also in the form (using the last line of equation (3)) as ˆβ X) 1X y ] 1M X 1 ) 1 1M X 1 ) 1 X 1X X ) 1 X 1 y M 1 X ) 1 X X 1 1X 1 ) 1 M 1 X ) 1 X y 1 M X 1 ) 1 X 1y 1M X 1 ) 1 X 1X X ) 1 X y M 1 X ) 1 X y M 1 X ) 1 X X 1 1X 1 ) 1 X 1y 1 M X 1 ) 1 X 1 I X X ) 1 X ]y M 1 X ) 1 X I X 1 1X 1 ) 1 X 1]y 1 M X 1 ) 1 X 1 M y M 1 X ) 1 (33) X M 1 y 8
3. From previous classes we already know that V(ˆβ) σ X) 1. We also know that V( ˆβ 1 ) σ 1M X 1 ) 1, which is a first element of the X) 1 matrix in our case (see previous part). The matrix 1M X 1 ) 1 can be rewritten as 1M X 1 ) 1 X 1(I X X ) 1 X )X 1 ] 1 X 1X 1 X 1X X ) 1 X X 1 ] 1. (34) In the next step we have to apply the Inverse Matrix Theorem (a proof can be found in Lütkepohl (1997): Handbook of Matrices) in order to decompose a matrix 1M X 1 ) 1. The Theorem states that for all compatible matrices A, B, and C holds that (A BC 1 B ) A 1 + A 1 B(C B A 1 B) 1 B A 1. (35) In our case, A X 1X 1, B X 1X and finally C X ). matrices into equation (35) gives Plugging all these 1M X 1 ) 1 X 1X 1 X 1X X ) 1 X X 1 ] 1 1X 1 ) 1 + 1X 1 ) 1 X 1X X X X X 1 1X 1 ) 1 X 1X ]X X 1 1X 1 ) 1 1X 1 ) 1 + 1X 1 ) 1 X 1X X M 1 X ]X X 1 1X 1 ) 1. (36) Finally, we have to plug this result form equation (36) back into the formula for V( ˆβ 1 ), which results V( ˆβ 1 ) σ 1M X 1 ) 1 σ 1X 1 ) 1 + 1X 1 ) 1 X 1X M 1 X )X X 1 1X 1 ) 1]. (37) Keywords for revision: You should be absolutely clear about these terms: size and power of the test, consistent test, unbiased test, the most powerful test. Moreover, you are strongly advised to check a matrix differentiation needed for derivation of the OLS estimator! 9