2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

Size: px

Start display at page:

Download "2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28"

Cecil Caldwell
5 years ago
Views:

1 2. A Review of Some Key Linear Models Results Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

2 A General Linear Model (GLM) Suppose y = Xβ + ɛ, where y R n is the response vector, X is an n p matrix of known constants, β R p is an unknown parameter vector, and ɛ is a vector of random errors satisfying E(ɛ) = 0. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

3 This GLM says simply that y is a random vector satisfying E(y) = Xβ for some β R p. The distribution of y is left unspecified. We know only that y is random and its mean is in the column space of X; i.e., E(y) C(X). Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

4 Ordinary Least Squares (OLS) Estimation of E(y) Because we know E(y) C(X), a natural estimator of E(y) is the Ordinary Least Squares Estimator (OLSE), which is the unique point in C(X) that is closest to y in terms of Euclidean distance. The OLSE of E(y) is given by ŷ P X y, where P X = X(X X) X, because P X y C(X) and y P X y 2 < y z 2 z C(X) \ {P X y}. opyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

5 The Orthogonal Projection Matrix P X = X(X X) X is known as the orthogonal projection matrix (a.k.a. the perpendicular projection operator) onto the column space of X and has the following properties: P X is symmetric (i.e., P X = P X). P X is idempotent (i.e., P X P X = P X ). P X X = X and X P X = X. rank(x) = rank(p X ) = tr(p X ). P X = X(X X) X is the same matrix for all generalized inverses (X X) of X X. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

6 The OLSE of a Linear Function of E(y) For any q n matrix A, AE(y) is a linear function of E(y). For any q n matrix A, the OLSE of AE(y) = AXβ is A[OLSE of E(y)] = Aŷ = AP X y = AX(X X) X y. opyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

7 The OLSE of an estimable function of β For any q n matrix A, AE(y) = AXβ is a linear function of β of the form Cβ, where C = AX. From the previous slide, we know OLSE of Cβ = AXβ = AE(y) is AX(X X) X y = C(X X) X y. Now if C is any q p matrix, we say that the linear function of β given by Cβ is estimable if and only if C = AX for some matrix q n matrix A. The OLSE of an estimable linear function Cβ is C(X X) X y. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

8 The OLSE of Estimable Functions of β An equivalent definition for the OLSE of an estimable linear function of β, given by Cβ = AXβ, can be stated in terms of solutions to the Normal Equations: X Xb = X y. The OLSE of estimable Cβ is Cˆβ, where ˆβ is any solution for b in the Normal Equations. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

9 Solutions to the Normal Equations The Normal Equations X Xb = X y have (X X) 1 X y as the unique solution for b if rank(x) = p. The Normal Equations have infinitely many solutions for b if rank(x) < p. ˆβ = (X X) X y is always a solution to the Normal Equations for any (X X), a generalized inverse of X X. opyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

10 Uniqueness of the OLSE of an Estimable Cβ If Cβ is estimable, then Cˆβ is the same for all solutions ˆβ to the Normal Equations. In particular, the unique OLSE of Cβ is Cˆβ = C(X X) X y = AX(X X) X y = AP X y, where C = AX. opyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

11 The OLSE is a Linear Unbiased Estimator If Cβ is estimable, then Cˆβ is a linear unbiased estimator of Cβ. The OLSE is a linear estimator because it s a linear function of y: Cˆβ = C(X X) X y = My, where M = C(X X) X. The OLSE is unbiased because, for all β R p, E(Cˆβ) = E(C(X X) X y) = C(X X) X E(y) = AX(X X) X E(y) = AP X E(y) = AP X Xβ = AXβ = Cβ. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

12 The Guass-Markov Model (GMM) Suppose y = Xβ + ɛ, where y R n is the response vector, X is an n p matrix of known constants, β R p is an unknown parameter vector, and ɛ is a vector of random errors satisfying E(ɛ) = 0 and Var(ɛ) = σ 2 I for some unknown variance parameter σ 2 R +. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

13 The GMM is a Special Case of the GLM The GMM is a special case of the GLM presented previously. We have added the assumption Var(ɛ) = σ 2 I; i.e., we assume the errors are uncorrelated and have constant variance. All the results presented for the GLM hold for the GMM. opyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

14 The Guass-Markov Theorem For the GMM, we have an additional result provided by the Gauss-Markov Theorem: The OLSE of an estimable function Cβ is the Best Linear Unbiased Estimator (BLUE) of Cβ in the sense that the OLSE Cˆβ has the smallest variance among all linear unbiased estimators of Cβ. opyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

15 Unbiased Estimation of σ 2 An unbiased estimator of σ 2 under the GMM is given by ˆσ 2 y (I P X )y, where r = rank(x). n r Because I P X = (I P X )(I P X ) = (I P X ) (I P X ), y (I P X )y = y (I P X ) (I P X )y = {(I P X )y} {(I P X )y} = (I P X )y 2 = y P X y 2 = y ŷ 2 = Sum of Squared Errors (SSE). Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

16 Gauss-Markov Model with Normal Errors (GMMNE) Suppose y = Xβ + ɛ, where y R n is the response vector, X is an n p matrix of known constants, β R p is an unknown parameter vector, and ɛ N(0, σ 2 I) for some unknown variance parameter σ 2 R +. opyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

17 The GMMNE is a special case of the GMM. We have added the assumption ɛ is multivariate normal. The GMMNE implies y N(Xβ, σ 2 I). The GMMNE is useful for drawing statistical inferences regarding estimable Cβ. opyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

18 Throughout the remainder of these slides we will assume the GMMNE model holds, C is a q p matrix such that Cβ is estimable, rank(c) = q, and d is a known q 1 vector. These assumptions imply H 0 : Cβ = d is a testable hypothesis. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

19 The Distribution of Cˆβ and ˆσ 2 Cˆβ N(Cβ, σ 2 C(X X) C ). (n r)ˆσ 2 σ 2 χ 2 n r ˆσ2 σ 2 χ2 n r n r ˆσ2 σ2 n r χ2 n r. Cˆβ and ˆσ 2 are independent. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

20 The F Test of H 0 : Cβ = d The test statistic F (Cˆβ d) [ Var(Cˆβ)] 1 (Cˆβ d)/q = (Cˆβ d) [ˆσ 2 C(X X) C ] 1 (Cˆβ d)/q = (Cˆβ d) [C(X X) C ] 1 (Cˆβ d)/q ˆσ 2 has a non-central F distribution with non-centrality parameter (Cβ d) [C(X X) C ] 1 (Cβ d) 2σ 2 and degrees of freedom q and n r. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

21 The F Test of H 0 : Cβ = d (continued) The non-negative non-centrality parameter (Cβ d) [C(X X) C ] 1 (Cβ d) 2σ 2 is equal to zero if and only if H 0 : Cβ = d is true. If H 0 : Cβ = d is true, the statistic F has a central F distribution with q and n r degrees of freedom (F q,n r ). Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

22 The F Test of H 0 : Cβ = d (continued) Thus, to test H 0 : Cβ = d, we compute the test statistic F and compare the observed value of F to the F q,n r distribution. If F is so large that it seems unlikely to have been a draw from the F q,n r distribution, we reject H 0 and conclude Cβ d. The p-value of the test is the probability that a random variable with distribution F q,n r matches or exceeds the observed value of the test statistic F. opyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

23 The t Test of H 0 : c β = d for Estimable c β The test statistic t = c ˆβ d Var(c ˆβ) c ˆβ d ˆσ2 c (X X) c has a non-central t distribution with non-centrality parameter c β d σ2 c (X X) c and degrees of freedom n r. opyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

24 The t Test (continued) The non-centrality parameter c β d σ2 c (X X) c is equal to zero if and only if H 0 : c β = d is true. If H 0 : c β = d is true, the statistic t has a central t distribution with n r degrees of freedom (t n r ). Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

25 The t Test (continued) Thus, to test H 0 : c β = d, we compute the test statistic t and compare the observed value of t to the t n r distribution. If t is so far from zero that it seems unlikely to have been a draw from the t n r distribution, we reject H 0 and conclude c β d. The p-value of the test is the probability that a random variable with distribution t n r would be as far or farther from 0 than the observed value of the t test statistic. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

26 A 100(1 α)% Confidence Interval for Estimable c β ) (c ˆβ tn r,1 α/2 ˆσ 2 c (X X) c, c ˆβ + tn r,1 α/2 ˆσ 2 c (X X) c c ˆβ ± tn r,1 α/2 ˆσ 2 c (X X) c estimate ± (distribution quantile) (standard error of estimate) opyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

27 Form of the t Statistic for Testing H 0 : c β = d t = estimate d standard error of estimate = estimate d Var(estimator) t 2 = (estimate d)2 Var(estimator) = (estimate d)[ Var(estimator)] 1(estimate d)/1 Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

28 Revisiting the F Statistic for Testing H 0 : Cβ = d F = (estimate d) [ Var(estimator)] 1(estimate d)/q = (Cˆβ d) [ Var(Cˆβ)] 1 (Cˆβ d)/q = (Cˆβ d) [ˆσ 2 C(X X) C ] 1 (Cˆβ d)/q = (Cˆβ d) [C(X X) C ] 1 (Cˆβ d)/q ˆσ 2 opyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

Estimating Estimable Functions of β Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 7 The Response Depends on β Only through Xβ In the Gauss-Markov or Normal Theory Gauss-Markov Linear