11 Hypothesis Testing

Size: px

Start display at page:

Download "11 Hypothesis Testing"

Brook Phillips
5 years ago
Views:

1 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes the ith row of A) 111 Definition: The hypothesis H : Aβ is testable if a i β is an estimable function for each row a i of A 112 Note: Recall that a i β is estimable if a i b i X for some b i Therefore H : Aβ is testable if A MX for some M, ie the rows of A are linearly dependent on the rows of X 113 Example: (One-way ANOVA with 3 groups) Y 11 Y 1J Y 21 Y 2J Y 31 Y 3J µ α 1 α 2 α 3 ε 11 ε 1J ε 21 + ε 2J ε 31 ε 3J Examples of testable hypotheses are: H :(1, 1,, )β µ + α 1 H :(1,, 1, )β µ + α 2 H :(1,,, 1)β µ + α 3 H :(, 1, 1, )β α 1 α 2 ( ) ( 1 1 α1 α H : β α 2 α 3 ) ( ), ie α 1 α 2 α 3 (no group effects) How should we test H : Aβ? We could compare the residual sum of squares (RSS) for the full model Y Xβ + ε to the residual sum of squares (RSS H ) for the restricted model (with Aβ )

2 29 Let µ E[Y] Under the full model, µ Xβ R(X) Ω IfH : Aβ is a testable hypothesis with A MX, then H : Aβ (and µ Xβ) H : Mµ (and µ Xβ) H : µ R(X) N(M) ω, where N (M) {u : Mu } is the null space of M Thus we have translated a hypothesis about β into a hypothesis about µ E[Y] We can write ω as {µ : µ Xβ, Aβ } or, equivalently, ω {µ : µ Xβ, Mµ } e (I P Ω )Y Y Ω Ŷ P Ω Y Ŷ H P ωy Ŷ ŶH (P Ω P ω)y ω Let Ŷ P ΩY and ŶH P ω Y be the orthogonal projections onto Ω and ω The RSS for the full model is RSS (Y Ŷ) (Y Ŷ) Y (I P Ω )Y and the RSS for the restricted model (with µ ω) is RSS H (Y ŶH) (Y ŶH) Y (I P ω )Y Hence RSS H RSS Y (P Ω P ω )Y 114 Theorem: Let ΩR(X) and ω Ω N(M) Then 1 P Ω P ω P ω Ω 2 ω ΩR(P Ω M ) 3 If H : Aβ is a testable hypothesis, P Ω P ω X(X X) A [A(X X) A ] A(X X) X 115 Theorem: If H : Aβ is a testable hypothesis, then RSS H RSS (Aˆβ) [A(X X) A ] (Aˆβ)

3 3 116 Theorem: Let H : Aβ be a testable hypothesis (a) cov(aˆβ) σ 2 A(X X) A (b) If rank(a) q, then (RSS H RSS)/σ 2 (Aˆβ) [cov(aˆβ)] 1 (Aˆβ) (c) E[RSS H RSS] σ 2 q +(Aβ) [A(X X) A ] 1 (Aβ) (d) If H : Aβ is true and Y N n (Xβ,σ 2 I), then (RSS H RSS)/σ 2 χ 2 q When H : Aβ is true, E[RSS H RSS] σ 2 q Therefore, we form a test statistic by calculating RSS H RSS qˆσ 2 (RSS H RSS)/q RSS/(n r) 117 Definition: Let X 1 and X 2 be independent random variables with X 1 χ 2 d 1 and X 2 χ 2 d 2 Then the distribution of the ratio F X 1/d 1 X 2 /d 2 is defined as the F distribution with d 1 numerator degrees of freedom and d 2 denominator degrees of freedom and is denoted F d1,d Theorem: If Y N n (Xβ,σ 2 I) and H : Aβ is a testable hypothesis with rank(a q p )q, then, when H is true, F (RSS H RSS)/q F q,n r, RSS/(n r) the F distribution with q and n r degrees of freedom 119 Note: If rank(a) q, then ŶH Xˆβ H, with where ˆβ (X X) X Y ˆβ H ˆβ (X X) A [A(X X) A ] 1 Aˆβ, 111 Note: The F-test extends to H : Aβ c, for a constant c In this case, our previous results become (if rank(a) q) ˆβ H ˆβ (X X) A [A(X X) A ] 1 (Aˆβ c), RSS H RSS (Aˆβ c) [A(X X) A ] 1 (Aˆβ c) and F has the same distribution as before The derivations use a solution β to Aβ c and Ỹ Y Xβ X(β β )+ε Xγ + ε, where γ β β H becomes H : Aγ, so we can apply the previous theory toỹ

4 Example: The t-test Let U 1,,U n1 be iid N(µ 1,σ 2 ) and V 1,,V n2 be iid N(µ 2,σ 2 ), independently of the U i As a linear model, U 1 1 ε 1 U n1 V 1 V n ( µ1 µ 2 ) + ε n1 ε n1 +1 The hypothesis H : µ 1 µ 2 leads to F RSS [ ( H RSS RSS/(n 2) (Ū V 1 ) 2 S )] 1 T 2, n 1 n 2 where T (Ū V )/(S 1 n n 2 ) is the two-sample t statistic ε n 1112 Example: Multiple Linear Regression The test H : β j (j ) leads to Y i β + β 1 x i1 + + β p 1 x i,p 1 + ε i F RSS H RSS RSS/(n p) (ˆβ j ) 2 /[SE(ˆβ j )] 2 T 2, where T ˆβ j /SE(ˆβ j ) is the usual t statistic for testing the significance of coefficients in a multiple regression model 1113 Example: Simple Linear Regression Y i β + β 1 (x i x)+ε i Then and ˆβ 1 i x iy i i x i i Y i/n i i (x i x) 2 (x i x)(y i Ȳ ) i (x i x) 2 var( ˆβ 1 )σ 2 / i (x i x) 2 From the previous example, the F statistic for testing H : β 1 is F ˆβ 2 1 S 2 / i (x i x) 2

5 32 It can be shown that RSS (1 r 2 ) i (Y i Ȳ )2 (1 r 2 )RSS H, where r is the sample correlation coefficient: r i (x i x)(y i Ȳ ) [i (x i x) 2 i (Y i Ȳ )2] 1/2 This means that r 2 (RSS H RSS)/RSS H is the proportion of variance (RSS) explained by the regression relationship We will later generalize this to the sample multiple correlation coefficient (R 2 ) 112 Power of the F -Test: Consider the model Y Xβ +ε, ε N n (,σ 2 I), with rank(x n p )r Then the F statistic for testing H : Aβ is F (RSS H RSS)/q, RSS/(n r) where rank(a q p )q Our goal is to calculate Power P (F >F α q,n r H not true) 1114 Definition: Let X 1 and X 2 be independent random variables with X 1 χ 2 d 1 (λ) and X 2 χ 2 d 2 Then the distribution of the ratio F X 1/d 1 X 2 /d 2 is defined as the non-central F distribution with d 1 numerator degrees of freedom, d 2 denominator degrees of freedom, and non-centrality parameter λ, and is denoted F d1,d 2 (λ) 1115 Theorem: The F statistic for testing H : Aβ has the non-central F distribution F F q,n r (λ), where λ µ (P Ω P ω )µ/2σ Note: When calculating the non-centrality parameter λ, we can use the following representations: σ 2 2λ µ (P Ω P ω )µ Y (P Ω P ω )Y Yµ (RSS H RSS) Yµ (Aˆβ) [A(X X) A ] 1 (Aˆβ) Yµ (Aβ) [A(X X) A ] 1 (Aβ) So we just have to substitute the true mean µ under the alternative hypothesis or the true parameter Aβ into appropriate formulas for RSS H RSS

6 The Overall F -Test: Assume the linear model Y i β + β 1 x i1 + + β p 1 x i,p 1 + ε i, with full rank design matrix (rank(x) p) Note that we are assuming the model contains an intercept Suppose we want to test whether the overall model is significant, ie H : β 1 β 2 β p 1 This can be written as H : Aβ (, I (p 1) (p 1) )β The F test for H yields F (RSS H RSS)/(p 1) RSS/(n p) F p 1,n p, if H is true This is called the overall F -test statistic for the linear model It is useful as a preliminary test of the significance of the model prior to performing model selection to determine which variables in the model are important 114 The Multiple Correlation Coefficient: The sample multiple correlation coefficient is defined as the correlation between the observations Y i and the fitted values Ŷi from the regression model: R corr(y i, Ŷi) i (Y i Ȳ )(Ŷi Ŷ ) [ i (Y i Ȳ )2 i (Ŷi Ŷ ) 2 ] 1/ Theorem: ANOVA decomposition: (Y i Ȳ )2 (Y i Ŷi) 2 + i i i (Ŷi Ȳ )2 ie Total SS Residual SS + Regression SS 1118 Theorem: R 2 as coefficient of determination: R 2 i (Ŷi Ȳ )2 i (Y i Ȳ )2 REG:SS TOTAL:SS, or equivalently, 1 R 2 i (Y i Ŷ )2 i (Y i Ȳ )2 RSS TOTAL:SS 1119 Note: R 2 is the proportion of variance in the Y i explained by the regression model R 2 is a generalization of r 2 for simple linear regression It indicates how closely the estimated linear model fits the data If R 2 1 (the maximum value) then Y i Ŷi, and the model is a perfect fit

7 Theorem: The F-test of a hypothesis of the form H :(, A 1 )β (i e the test does not involve the intercept β ) is a test for a significant reduction in R 2 : F (R2 R 2 H ) (1 R 2 ) (n p), q where R 2 and RH 2 are the sample multiple correlation coefficients for the full model and the reduced model, respectively 1121 Note: This shows that R 2 cannot increase when deleting a variable in the model (other than the intercept) 115 A Canonical Form for H: There are two ways to calculate the statistic F (RSS H RSS)/q RSS/(n r) for a testable hypothesis H : Aβ 1 Fit the full model and calculate RSS H RSS (Aˆβ) [A(X X) A ] 1 (Aˆβ) 2 Fit the full model and calculate RSS Then fit the reduced model and calculate RSS H The reduced model is Y Xβ + ε, with Aβ To fit this model using a canned computer package, we need to represent it as Y X H β H + ε H This is called a canonical form for H Assume rank(a) q Then reorder the components of β and columns of A so that A (A 1, A 2 ) where A 2 consists of q linearly independent columns from A (A 2 is invertible) Hence ( ) β1 H :(A 1, A 2 ) A β 1 β 1 + A 2 β 2, 2 and ( ) β1 Xβ (X 1, X 2 ) β 2 This leads to X H (X 1 X 2 A 1 2 A 1) and β H β Example: Analysis of variance Y ij α i + ε ij In block matrix form the model is [Y i (Y i1,,y ini )] Y 1 Y 2 Y n 1 n1 n1 n1 n2 1 n2 n2 np np 1 np α 1 α 2 + α p ε 1 ε 2 ε n Test H : α 1 α 2 α p, ie A canoncial form for H is Y 1α 1 + ε α 1 α 2 α p

8 The F Test for Goodness of Fit: How can we assess if a linear model Y Xβ + ε is appropriate? Do the predictors adequately describe the mean of Y or are there important predictors excluded? This is quite different from the overall F test which tests if the predictors are related to the response We can test model adequacy if there are replicates, ie independent observations with the same values of the predictors (and so the same mean) Suppose, for i 1,,n, we have replicates Y i1,,y iri corresponding to the values x i1,,x i,p 1 of the predictors The full model is Y ir µ i + ε ir where the µ i are any constants We wish to test whether they have the form If µ (µ 1,,µ n ), we want to test the hypothesis µ i β + β 1 x i1 + + β p 1 x i,p 1 H : µ Xβ We now apply the general F test to H The RSS under the full model is n R i RSS (Y ir Ȳi) 2, i1 r1 and for the reduced model n R i RSS H (Y ir ˆβ H ˆβ 1H x i1 ˆβ p 1,H x i,p 1 ) 2 i1 r1 It can be shown that in the case R i R the estimates under the reduced model are ˆβ H (X X) 1 X Z, where Z i Ȳi R r1 Y ir /R TheF statistic is F (RSS H RSS)/(n p) RSS/(N n) F n p,n n, where N n i1 R i This test is also called the lack-of-fit test

14 Multiple Linear Regression

B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in