3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

Size: px

Start display at page:

Download "3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43"

Collin Stevenson
5 years ago
Views:

1 3. The F Test for Comparing Reduced vs. Full Models opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

2 Assume the Gauss-Markov Model with normal errors: y = Xβ + ɛ, ɛ N(0, σ 2 I). Suppose C(X 0 ) C(X) and we wish to test H 0 : E(y) C(X 0 ) vs. H A : E(y) C(X) \ C(X 0 ). Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

3 The reduced model corresponds to the null hypothesis and says that E(y) C(X 0 ), a specified subspace of C(X). The full model says that E(y) can be anywhere in C(X). For example, suppose X 0 = and X = Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

4 In this case, the reduced model says that all 6 observations have the same mean. The full model says that there are three groups of two observations. Within each group, observations have the same mean. The three group means may be different from one another. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

5 For this example, let µ 1, µ 2, and µ 3 be the elements of β in the full model, i.e., β = [µ 1, µ 2, µ 3 ]. Then, for the full model, µ µ µ E(y) = Xβ = µ µ = 2, and µ µ µ µ 3 is equivalent to H 0 : E(y) C(X 0 ) vs. H A : E(y) C(X) \ C(X 0 ) H 0 : µ 1 = µ 2 = µ 3 vs. H A : µ i µ j, for some i j. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

6 For the general case, consider the test statistic F = y (P X P X0 )y/ [rank(x) rank(x 0 )]. y (I P X )y/ [n rank(x)] To show that this statistic has an F distribution, we will use the following fact: P X0 P X = P X P X0 = P X0. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

7 There are many ways to see that this fact is true. First, Thus, C(X 0 ) C(X) = Each column of X 0 C(X) = P X X 0 = X 0. This implies that P X P X0 = P X X 0 (X 0X 0 ) X 0 = X 0 (X 0X 0 ) X 0 = P X0. (P X P X0 ) = P X 0 = P X 0 P X = P X 0 = P X0 P X = P X0. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

8 Alternatively, a R n, P X0 a C(X 0 ) C(X). Thus, a R n, P X P X0 a = P X0 a. This implies P X P X0 = P X0. Transposing both sides of this equality and using symmetry of projection matrices yields P X0 P X = P X0. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

9 Alternatively, C(X 0 ) C(X) = XB = X 0 for some B because every column of X 0 must be in C(X). Thus, P X0 P X = X 0 (X 0X 0 ) X 0P X = X 0 (X 0X 0 ) (XB) P X = X 0 (X 0X 0 ) B X P X = X 0 (X 0X 0 ) B X = X 0 (X 0X 0 ) (XB) = X 0 (X 0X 0 ) X 0 = P X0. P X P X0 = P X X 0 (X 0X 0 ) X 0 = P X XB(X 0X 0 ) X 0 = XB(X 0X 0 ) X 0 = X 0 (X 0X 0 ) X 0 = P X0. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

10 Note that P X P X0 is a symmetric and idempotent matrix: (P X P X0 ) = P X P X 0 = P X P X0. (P X P X0 )(P X P X0 ) = P X P X P X P X0 P X0 P X + P X0 P X0 = P X P X0 P X0 + P X0 = P X P X0. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

11 Now back to determining the distribution of F = y (P X P X0 )y/ [rank(x) rank(x 0 )]. y (I P X )y/ [n rank(x)] An important first step is to note that F = ( ) y PX P X0 y/ [rank(x) rank(x σ 2 0 )] y ( ) I P X. σ y/ [n rank(x)] 2 Now we can show that the numerator is a chi-square random variable divided by its degrees of freedom, independent of the denominator, which is a central chi-square random variable divided by its degrees of freedom. Once we show all these things, we will have established that the statistic F has an F distribution (see Slide Set 1 Slide 35). opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

12 We will use the following result from Slide Set 1. Suppose Σ is an n n positive definite matrix. Suppose A is an n n symmetric matrix of rank m such that AΣ is idempotent (i.e., AΣAΣ = AΣ). Then y N(µ, Σ) = y Ay χ 2 m(µ Aµ/2). For the numerator of our F statistic, we have ( µ = Xβ, Σ = σ 2 PX P X0 I, A = σ 2 ), and opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

13 ( ) PX P X0 m = rank(a) = rank = rank(p X P X0 ) σ 2 = tr(p X P X0 ) = tr(p X ) tr(p X0 ) = rank(p X ) rank(p X0 ) = rank(x) rank(x 0 ). (Multiplying by a nonzero constant does not affect the rank of a matrix. Rank is the same as trace for idempotent matrices. Trace of a difference is the same as the difference of traces. The rank of a projection matrix is equal to the rank of the matrix whose column space is projected onto.) Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

14 To verify that Σ is positive definite, note that for any a R n \ {0}, a Σa = a ( σ 2 I ) a = σ 2 a a = σ 2 n a 2 i > 0. i=1 To verify that AΣ is idempotent, we have ( ) PX P X0 AΣ = (σ 2 I) = P X P X0. Thus, we have y ( PX P X0 σ 2 σ 2 ) ( 1 y χ 2 rank(x) rank(x 0 ) 2 β X ( PX P X0 σ 2 ) ) Xβ. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

15 Denominator Distribution From the result on Slide Set 2 Slide 19, we have ( ) I y PX y χ 2 σ n rank(x). 2 (This fact can be proved using the result on Slide Set 1 Slide 31, following the same type of argument used to show the distribution of the numerator.) Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

16 Independence of Numerator and Denominator By Slide Set 1 Slide 38, ( ) y PX P X0 y is independent of y ( ) I P X σ 2 σ y because 2 ( ) ( ) PX P X0 I (σ 2 PX I) σ 2 σ 2 = 1 σ 2 (P X P X P X P X0 + P X0 P X ) = 1 σ 2 (P X P X P X0 + P X0 ) = 0. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

17 Thus, it follows that F = y (P X P X0 )y/ [rank(x) rank(x 0 )] y (I P X )y/ [n rank(x)] ( β ) X (P X P X0 )Xβ F rank(x) rank(x0 ),n rank(x). 2σ 2 If H 0 is true, i.e. if E(y) = Xβ C(X 0 ), then the noncentrality parameter is 0 because (P X P X0 )Xβ = P X Xβ P X0 Xβ = Xβ Xβ = 0. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

18 In general, the noncentrality parameter quantifies how far the mean of y is from C(X 0 ) because β X (P X P X0 )Xβ = β X (P X P X0 ) (P X P X0 )Xβ = (P X P X0 )Xβ 2 = P X Xβ P X0 Xβ 2 = Xβ P X0 Xβ 2 = E(y) P X0 E(y) 2. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

19 Note that y (P X P X0 )y = y [(I P X0 ) (I P X )] y = y (I P X0 )y y (I P X )y Also rank(x) rank(x 0 ) = SSE REDUCED SSE FULL. = [n rank(x 0 )] [n rank(x)] = DFE REDUCED DFE FULL, where DFE = Degrees of Freedom for Error. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

20 Thus, the F statistic has the familiar form (SSE REDUCED SSE FULL )/(DFE REDUCED DFE FULL ) SSE FULL /DFE FULL. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

21 Equivalence of F Tests It turns out that this reduced vs. full model F test is equivalent to the F test for testing H 0 : Cβ = d vs. H A : Cβ d with an appropriately chosen C and d. The equivalence of these tests is proved in STAT 611. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

22 Example: F Test for Lack of Linear Fit Suppose a balanced, completely randomized design is used to assign 1, 2, or 3 units of fertilizer to a total of 9 plots of land. The yield harvested from each plot is recorded as the response. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

23 Let y ij denote the yield from the jth plot that received i units of fertilizer (i, j = 1, 2, 3). Suppose all yields are independent and y ij N(µ i, σ 2 ) for all i, j = 1, 2, 3. If y = y 11 y 12 y 13 y 21 y 22 y 23 y 31 y 32 y 33, then E(y) = µ 1 µ 1 µ 1 µ 2 µ 2 µ 2 µ 3 µ 3 µ 3. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

24 Suppose we wish to determine whether there is a linear relationship between the amount of fertilizer applied to a plot and the expected value of a plot s yield. In other words, suppose we wish to know if there exists real numbers β 1 and β 2 such that µ i = β 1 + β 2 (i) for all i = 1, 2, 3. Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

25 Consider testing H 0 : E(y) C(X 0 ) vs. H A : E(y) C(X) \ C(X 0 ), where X 0 = 1 2 and X = Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

26 Note H 0 : E(y) C(X 0 ) β R 2 E(y) = X 0 β [ β1 β 2 ] R 2 µ 1 µ 1 µ 1 µ 2 µ 2 µ 2 µ 3 µ 3 µ 3 = µ i = β 1 + β 2 (i) for all i = 1, 2, [ β1 β 2 ] = β 1 + β 2 (1) β 1 + β 2 (1) β 1 + β 2 (1) β 1 + β 2 (2) β 1 + β 2 (2) β 1 + β 2 (2) β 1 + β 2 (3) β 1 + β 2 (3) β 1 + β 2 (3) Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

27 Note E(y) C(X) β R 3 E(y) = Xβ β 1 β 2 β 3 R 3 µ 1 µ 1 µ 1 µ 2 µ 2 µ 2 µ 3 µ 3 µ 3 = β 1 β 2 β 3 = This condition clearly holds with β i = µ i for all i = 1, 2, 3. β 1 β 1 β 1 β 2 β 2 β 2 β 3 β 3 β 3. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

28 The alternative hypothesis H A : E(y) C(X) \ C(X 0 ) is equivalent to H A : There do not exist β 1, β 2 R such that µ i = β 1 + β 2 (i) i = 1, 2, 3. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

29 Because the lack of fit test is a reduced vs. full model F test, we can also obtain this test by testing for appropriate C and d. H 0 : Cβ = d vs. H A : Cβ d β = µ 1 µ 2 µ 3 C =? d =? Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

30 R Code and Output > x=rep(1:3,each=3) > x [1] > > y=c(11,13,9,18,22,23,19,24,22) > > plot(x,y,pch=16,col=4,xlim=c(.5,3.5), + xlab="fertilizer Amount", + ylab="yield",axes=f,cex.lab=1.5) > axis(1,labels=1:3,at=1:3) > axis(2) > box() opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

31 Yield Fertilizer Amount Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

32 > X0=model.matrix( x) > X0 (Intercept) x Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

33 > X=model.matrix( 0+factor(x)) > X factor(x)1 factor(x)2 factor(x) Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

34 > proj=function(x){ + x%*%ginv(t(x)%*%x)%*%t(x) + } > > library(mass) > PX0=proj(X0) > PX=proj(X) Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

35 > Fstat=(t(y)%*%(PX-PX0)%*%y/1)/ + (t(y)%*%(diag(rep(1,9))-px)%*%y/(9-3)) > Fstat [,1] [1,] > > pvalue=1-pf(fstat,1,6) > pvalue [,1] [1,] Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

36 > reduced=lm(y x) > full=lm(y 0+factor(x)) > > rvsf=function(reduced,full) + { + sser=deviance(reduced) + ssef=deviance(full) + dfer=reduced$df + dfef=full$df + dfn=dfer-dfef + Fstat=(sser-ssef)/dfn/ + (ssef/dfef) + pvalue=1-pf(fstat,dfn,dfef) + list(fstat=fstat,dfn=dfn,dfd=dfef, + pvalue=pvalue) + } Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

37 > rvsf(reduced,full) $Fstat [1] $dfn [1] 1 $dfd [1] 6 $pvalue [1] Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

38 > anova(reduced,full) Analysis of Variance Table Model 1: y x Model 2: y 0 + factor(x) Res.Df RSS Df Sum of Sq F Pr(>F) * --- Signif. codes: 0 *** ** 0.01 * Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

39 > test=function(lmout,c,d=0){ + b=coef(lmout) + V=vcov(lmout) + dfn=nrow(c) + dfd=lmout$df + Cb.d=C%*%b-d + Fstat=drop( t(cb.d)%*%solve(c%*%v%*%t(c))%*%cb.d/dfn) + pvalue=1-pf(fstat,dfn,dfd) + list(fstat=fstat,pvalue=pvalue) + } > test(full,matrix(c(1,-2,1),nrow=1)) $Fstat [1] $pvalue [1] Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

40 SAS Code and Output data d; input x y; cards; ; run; opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

41 proc glm; class x; model y=x; contrast Lack of Linear Fit x 1-2 1; run; Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

42 The SAS System The GLM Procedure Dependent Variable: y Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE y Mean Source DF Type I SS Mean Square F Value Pr > F x Source DF Type III SS Mean Square F Value Pr > F x opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

43 Contrast DF Contrast SS Mean Square F Value Pr > F Lack of Linear Fit Copyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38 Preliminaries Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 1 / 38 Notation for Scalars, Vectors, and Matrices Lowercase letters = scalars: x, c, σ. Boldface, lowercase letters