Estimating Estimable Functions of β Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 7
The Response Depends on β Only through Xβ In the Gauss-Markov or Normal Theory Gauss-Markov Linear Model, the distribution of y depends on β only through Xβ, i.e., y (Xβ, σ 2 I) or y N(Xβ, σ 2 I) If X is not of full column rank, there are infinitely many vectors in the set {b : Xb = Xβ} for any fixed value of β. Thus, no matter what the value of E(y), there will be infinitely many vectors b such that Xb = E(y) when X is not of full column rank. The response vector y can help us learn about E(y) = Xβ, but when X is not of full column rank, there is no hope of learning about β alone unless additional information about β is available. Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 2 / 7
Treatment Effects Model Researchers randomly assigned a total of six experimental units to two treatments and measured a response of interest. y ij = µ + τ i + ɛ ij, i =, 2; j =, 2, y y 2 y y 2 y 22 y 2 y y 2 y y 2 y 22 y 2 = = µ + τ µ + τ µ + τ + µ τ τ 2 ɛ ɛ 2 ɛ ɛ 2 ɛ 22 ɛ 2 + ɛ ɛ 2 ɛ ɛ 2 ɛ 22 ɛ 2 Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 7
In this case, it makes no sense to estimate β = [µ, τ, τ 2 ] because there are multiple (infinitely many, in fact) choices of β that define the same mean for y. For example, µ τ τ 2 = 5, 0 4 6, or 999 995 99 all yield same Xβ = E(y). When multiple values for β define the same E(y), we say that β is non-estimable. Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 4 / 7
Estimable Functions of β A linear function of β, Cβ, is said to be estimable if there is a linear function of y, Ay, that is an unbiased estimator of Cβ. Otherwise, Cβ is said to be non-estimable. Note that Ay is an unbiased estimator of Cβ if and only if E(Ay) = Cβ β IR p AXβ = Cβ β IR p AX = C. This says that we can estimate Cβ as long as Cβ = AXβ = AE(y) for some A, i.e., as long as Cβ is a linear function of E(y). The bottom line is that we can always estimate E(y) and all linear functions of E(y); all other linear functions of β are non-estimable. Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 5 / 7
E(y) = Xβ = µ τ τ 2 = µ + τ µ + τ µ + τ = [, 0, 0, 0, 0, 0]Xβ = [,, 0]β = µ + τ [0, 0, 0,, 0, 0]Xβ = [, 0, ]β = [, 0, 0,, 0, 0]Xβ = [0,, ]β = τ τ 2 are estimable functions of β. opyright c 202 Dan Nettleton (Iowa State University) Statistics 5 6 / 7
Estimating Estimable Functions of β If Cβ is estimable, then there exists a matrix A such that C = AX and Cβ = AXβ = AE(y) for any β IR p. It makes sense to estimate Cβ = AXβ = AE(y) by AÊ(y) = Aŷ = AP Xy = AX(X X) X y = AX(X X) X Xˆβ = AP X Xˆβ = AXˆβ = Cˆβ. Cˆβ is called the Ordinary Least Squares (OLS) estimator of Cβ. Note that although the hat is on β, it is Cβ that we are estimating. opyright c 202 Dan Nettleton (Iowa State University) Statistics 5 7 / 7
Invariance of Cˆβ to the Choice of ˆβ Although there are infinitely many solutions to the normal equations when X is not of full column rank, Cˆβ is the same for all normal equation solutions ˆβ whenever Cβ is estimable. To see this, suppose ˆβ and ˆβ 2 are any two solutions to the normal equations. Then Cˆβ = AXˆβ = AP X Xˆβ = AX(X X) X Xˆβ = AX(X X) X y = AX(X X) X Xˆβ 2 = AP X Xˆβ 2 = AXˆβ 2 = Cˆβ 2. Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 8 / 7
Suppose our aim is to estimate τ τ 2. As noted before, Xβ = µ τ τ 2 = µ + τ µ + τ µ + τ = [, 0, 0,, 0, 0]Xβ = [0,, ]β = τ τ 2. Thus, we can compute the OLS estimator of τ τ 2 as [, 0, 0,, 0, 0]ŷ = [0,, ]ˆβ, where ŷ = X(X X) X y and ˆβ is any solution to the normal equations. Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 9 / 7
The normal equations in this case are b b 2 b = y y 2 y y 2 y 22 y 2 6 0 0 b b 2 b = y y y 2. Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 0 / 7
ˆβ ȳ ȳ ȳ ȳ and ˆβ 2 0 ȳ are each solutions to the normal equations because 6 0 0 ȳ ȳ ȳ ȳ = y y y 2 = 6 0 0 0 ȳ. Thus, the OLS estimator of Cβ = [0,, ]β = τ τ 2 is ȳ 0 Cˆβ = [0,, ] ȳ ȳ ȳ = ȳ = [0,, ] ȳ = Cˆβ 2. Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 7
Let (X X) = /6 0 0 0 /6 /6 0 /6 /6 and (X X) 2 = 0 / 0 0 0 /. It is straightforward to verify that (X X) and (X X) 2 generalized inverses of X X. are each It is also easy to show that ˆβ = (X X) X y and ˆβ 2 = (X X) 2 X y. Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 2 / 7
P X = X(X X) X = = 0 / 0 0 / 0 0 / 0 0 0 / 0 0 / 0 0 / 0 / 0 0 0 / 0 0 =. opyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 7
Thus Ê(y) = ŷ = P X y = is our OLS estimator of E(y) = Xβ = µ τ τ 2 = y y 2 y y 2 y 22 y 2 = µ + τ µ + τ µ + τ. ȳ ȳ ȳ opyright c 202 Dan Nettleton (Iowa State University) Statistics 5 4 / 7
Also, we can see that the OLS estimator of τ τ 2 = [0,, ] µ τ τ 2 = [, 0, 0,, 0, 0] = [, 0, 0,, 0, 0] µ + τ µ + τ µ + τ µ τ τ 2 = [, 0, 0,, 0, 0]E(y) is opyright c 202 Dan Nettleton (Iowa State University) Statistics 5 5 / 7
[, 0, 0,, 0, 0]Ê(y) = [, 0, 0,, 0, 0]ŷ = [, 0, 0,, 0, 0] = ȳ ȳ ȳ ȳ Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 6 / 7
The Gauss-Markov Theorem Under the Gauss-Markov Linear Model, the OLS estimator c ˆβ of an estimable linear function c β is the unique Best Linear Unbiased Estimator (BLUE) in the sense that Var(c ˆβ) is strictly less than the variance of any other linear unbiased estimator of c β for all β IR p and all σ 2 IR +. The Gauss-Markov Theorem says that if we want to estimate an estimable linear function c β using a linear estimator that is unbiased, we should always use the OLS estimator. In our simple example of the treatment effects model, we could have used y y 2 to estimate τ τ 2. It is easy to see that y y 2 is a linear estimator that is unbiased for τ τ 2, but its variance is clearly larger than the variance of the OLS estimator ȳ (as guaranteed by the Gauss-Markov Theorem). opyright c 202 Dan Nettleton (Iowa State University) Statistics 5 7 / 7