Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44
1 OLS estimator 2 Restricted regression 3 Errors in variables 4 ANOVA 5 The F test in an ANOVA framework 6 Contrasts KC Border Linear Regression II March 6, 2017 2 / 44
OLS estimator Standard Linear Model y = Xβ + ε, where E ε = 0 and Var ε = E(εε ) = σ 2 I KC Border Linear Regression II March 6, 2017 3 / 44
OLS estimator OLS estimation With N observations on X 1,, X K, Y, let X be the N K matrix of regressors, and y be the N 1 vector of observations on the response Y Then if X has rank K, the OLS estimator ˆβ OLS of the parameter vector β is given by ˆβ OLS = (X X) 1 X y (1) It is obtained by orthogonally projecting y onto the column space of X KC Border Linear Regression II March 6, 2017 4 / 44
OLS estimator Regression and Correlation X X = N t x t (X X) 1 = β 0 = (X X) 1 X y = β 1 y t = β 0 + β 1 x t t x t t x 2 t and X t y = y t t y tx t t x t 2 t x t t x t N 1 N t (x t x) 2 1 N t (x t x) 2 ˆβ 1 = N( t y tx t ) ( t x t)( t y t) N( t x 2 t ) ( t x t) 2 = ( ( t xt 2)( t yt) ( t xt)( t ytx t ) ( t x t)( t y t) + N( t y tx t ) t y tx t ) ( t x t)( t y t)/n ( t x t 2 ) ( t x t) 2 /N KC Border Linear Regression II March 6, 2017 5 / 44
OLS estimator Corr(X, Y ) = Cov(X, Y ) (SD X)(SD Y ) = Cov(X, Y ) = E(X Y ) Given pairs (x t, y t ), t = 1,, N, of observations, define the sample correlation coefficient r by Nt=1 (x t x)(y t ȳ) r = Nt=1 Nt=1, (x t x) 2 (y t ȳ) 2 which is the sample analog of the correlation It is also known as the Pearson product-moment correlation coefficient Consider the centered variables x t = x t x, ỹ t = y t ȳ ˆβ 1 = N( t y tx t ) ( t x t)( t y t) N( t x t 2 ) ( t x t) 2 = N( t ỹt x t ) ( t x t)( t ỹt) N( t x t 2 ) ( t x t) 2 But by construction, x t = t t ỹ t = 0, KC Border Linear Regression II March 6, 2017 6 / 44
OLS estimator Now look at the formula for the correlation coefficient It can be rewritten as Nt=1 x t ỹ t r = = s x s ˆβ s x 1, y s y where s x = t (x t x) 2 = t x 2 t and s y = t ỹ 2 t Among other things this implies that r = 0 if and only the slope ˆβ 1 of the regression line is zero (If s x = 0, then all the x t are the same, and the slope is not identifiable) KC Border Linear Regression II March 6, 2017 7 / 44
OLS estimator Testing for serial correlation Regress e t on e t 1 : Test ˆβ 1 = 0 e t = β 0 + β 1 e t 1 KC Border Linear Regression II March 6, 2017 8 / 44
OLS estimator Testing linear restrictions on β To test q simultaneous restrictions, let H 0 : a = Aβ, where A is a q K matrix with rank q Theorem Under the null hypothesis, the test statistic F = 1 qs 2 (a A ˆβ OLS ) [ A(X X) 1 A ] (a A ˆβ OLS ) has an F -distribution with (q, N K) degrees of freedom KC Border Linear Regression II March 6, 2017 9 / 44
OLS estimator The F -test of the regression Many software packages, including R, compute for you something called the F -statistic for the regression The F -statistic for the regression tests the null hypothesis that all the coefficients on the non-constant terms are zero, H 0 : β 2 = β 3 = = β K = 0 (If you have a constant term, it is usually X 1 in our terminology) KC Border Linear Regression II March 6, 2017 10 / 44
OLS estimator Coefficient of Multiple Correlation R y 2 = y y = ˆβ OLSX X ˆβ OLS + e e + 2 ˆβ OLS X e }{{} = 0 The coefficient of multiple correlation R is a measure of the fraction of y y explained by the regressors Specifically, 1 R 2 = e e y y, or R2 = ŷ ŷ y y = ˆβ OLSX X ˆβ OLS y y The Pythagorean Theorem implies y y = e e + ŷ ŷ, so 0 R 2 1 KC Border Linear Regression II March 6, 2017 11 / 44
OLS estimator Geometry of R 2 R = R 2 is the cosine of the angle ϕ between y and ŷ = X ˆβ OLS y ˆβ 1 x 1 e 0 x 1 ϕ x 2 ˆβ2 x 2 ŷ KC Border Linear Regression II March 6, 2017 12 / 44
OLS estimator Adjusted R 2 Increasing the number of right-hand side variates can only decrease the sum of squared residuals, so it is desirable to penalize the measure of fit The adjusted R 2 is defined by: or (1 R 2 ) = 1 N K e e 1 N 1 y y = N 1 N K (1 R2 ) R 2 = 1 K N K + N 1 N K R2 It is possible for the adjusted R 2 to be negative KC Border Linear Regression II March 6, 2017 13 / 44
OLS estimator What is a good value for R 2? KC Border Linear Regression II March 6, 2017 14 / 44
OLS estimator Prediction intervals Let y = x β + ε, ŷ = x ˆβ OLS But what is the confidence interval for y? ŷ y = x ˆβ OLS x β ε = x ( ˆβ OLS β) ε Therefore Var(ŷ y ) = Var (x ( ˆβ ) OLS β) ε ( = Var x ( ˆβ ) OLS β) + Var(ε ) = σ 2 (x (X X) 1 x + 1) KC Border Linear Regression II March 6, 2017 15 / 44
OLS estimator Confidence intervals Under the normality hypothesis, x ˆβ OLS y σ 2 (x (X X) 1 x +1) (N K)s 2 σ 2 = x ˆβ OLS y s x (X X) 1 x + 1 Thus a (1 α) confidence interval of y is [ ŷ t α,n K 2 s x (X X) 1 x + 1, t(n K) ] ŷ + t α,n K 2 s x (X X) 1 x + 1 KC Border Linear Regression II March 6, 2017 16 / 44
Restricted regression The Lagrange Multiplier Theorem If x minimizes f (x) subject to g i (x) = 0 (i = 1,, m), and if g 1 (x ),, g m (x ) are linearly independent, then there exist Lagrange multipliers λ i, i = 1,, m such that f (x ) + λ 1 g 1 (x ) + + λ m g m (x ) = 0 KC Border Linear Regression II March 6, 2017 17 / 44
Restricted regression Restricted OLS To minimize (y Xb) (y Xb) subject to the constraint Ab = a (where A is q k), the LMT tells us to form the Lagrangean and solve the FOC (y Xb) (y Xb) + λ (Ab a) 2X y + 2X Xb + A λ = 0 (2) KC Border Linear Regression II March 6, 2017 18 / 44
Restricted regression Solving the FOC Premultiply by A(X X) 1 : 2A(X X) 1 X y + 2 A(X X) 1 (X X)b A(X X) 1 A λ = 0, }{{} =a so, solving for λ [ λ = 2 A(X X) 1 A ] 1 [ ] a A(X X) 1 X y Substitute this into (2) to get X y + X Xb A [A(X X) 1 A ] 1 [a A(X X) 1 X y] which after premultiplying by (X X) 1, with some work simplifies to b = ˆβ OLS + (X X) 1 A [A(X X) 1 A ] 1 (a A ˆβ OLS ) KC Border Linear Regression II March 6, 2017 19 / 44
Restricted regression Restricted residuals Let e r = y Xb be the vector of residuals from the restricted regression It can be shown that e r e r = e u e u + (a A ˆβ OLS ) [A(X X) 1 A ] 1 (a A ˆβ OLS ), where e u e u is the sum of squares from the unrestricted OLS regression Thus e r e r e u e u = (a A ˆβ OLS ) [A(X X) 1 A ] 1 (a A ˆβ OLS ) is a quadratic form in the q variables a A ˆβ OLS KC Border Linear Regression II March 6, 2017 20 / 44
Restricted regression Testing a linear restriction H 0 : Aβ = a Let e u and e r be the vector of residuals from the unrestricted and restricted regressions Then under the null hypothesis, F = e r e r e u e u q e u e u N K has an F -distribution with (q, N K) degrees of freedom The null hypothesis should be rejected if F F 1 α,q,n K KC Border Linear Regression II March 6, 2017 21 / 44
Restricted regression y e r e u y r y u y r x 2 y u 0 x 1 {Xb : Ab = a} Restricted regression with restriction β 1 = 1 The points y r, y u, y form a right triangle with hypotenuse y r y KC Border Linear Regression II March 6, 2017 22 / 44
Restricted regression F -tests and t-tests may seem to conflict! 2 1 0-1 -2-2 -1 0 1 2 KC Border Linear Regression II March 6, 2017 23 / 44
Errors in variables Measurement error True model: y = Xβ + ε, but observe X = X + V So the estimated model is y = Xβ + η, (3) The OLS estimate derived from (3) is ˆβ = β + (X X) 1 X (ε V β) The expectation is E ˆβ = β + E(X X) 1 X V β, which is not, in general, unbiased, nor consistent KC Border Linear Regression II March 6, 2017 24 / 44
ANOVA Some jargon According to Larsen and Marx, pp 431 432, The word factor is used to denote any treatment or therapy applied to the subjects being measured or any relevant feature (age, sex, ethnicity, etc) characteristic of those subjects Different versions, extents, or aspects, of a factor are referred to as levels Sometimes subjects or environments share certain characteristics that affect the way levels of a factor respond, yet those characteristics are of no intrinsic interest to the experimenter Any such set of conditions or subjects is called a block KC Border Linear Regression II March 6, 2017 25 / 44
ANOVA ANOVA ANOVA is an acronym for ANalysis Of VAriance KC Border Linear Regression II March 6, 2017 26 / 44
ANOVA Model equations One factor with k levels Y ij is the i th measurement of the response at factor level j n j observations at level j Y ij = µ j + ε ij, (i = 1,, n j ; j = 1,, k) n = n 1 + + n k is the total number of observations ε ij are assumed to be independent, have common mean zero and common variance σ 2 µ j is just the expected value of the response at level j KC Border Linear Regression II March 6, 2017 27 / 44
ANOVA ANOVA is a special case of the Standard Linear Model X j is a dummy variable or indicator for the j th level y 11 y n1 1 y 12 y n2 2 y 1k y nk k = 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 µ 1 µ 2 µ k + ε 11 ε n1 1 ε 12 ε n2 2 ε 1k ε nk k KC Border Linear Regression II March 6, 2017 28 / 44
ANOVA OLS and ANOVA X X = 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 KC Border Linear Regression II March 6, 2017 29 / 44
ANOVA n 1 0 0 0 n 2 0 0 X X = 0 0 n k KC Border Linear Regression II March 6, 2017 30 / 44
ANOVA X y = 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 y 11 y n1 1 y 12 y n2 2 y 1k y nk k KC Border Linear Regression II March 6, 2017 31 / 44
ANOVA X y = n1 i=1 y i1 n2 i=1 y i2 nk i=1 y ik KC Border Linear Regression II March 6, 2017 32 / 44
ANOVA (X X) 1 X y = n1 n2 nk i=1 y i1 n 1 i=1 y i2 n 2 i=1 y ik n k KC Border Linear Regression II March 6, 2017 33 / 44
The F test in an ANOVA framework Hypothesis testing in ANOVA The most common hypothesis is H 0 : µ 1 = = µ k, against the alternative H 1 : not all the µ j s are equal KC Border Linear Regression II March 6, 2017 34 / 44
The F test in an ANOVA framework A little more jargon y ij is the response of the i th observation at level j n j T j = y ij is the response total at level j i=1 Ȳ j = T j is the sample mean at level j n j k n j n T = y ij = T j is the sample overall total response j=1 i=1 j=1 Ȳ = 1 k n j y ij = 1 k T j is the sample overall average response n n j=1 i=1 j=1 KC Border Linear Regression II March 6, 2017 35 / 44
The F test in an ANOVA framework The fundamental identity For any list x 1,, x n, n i=1 x 2 i = n (x i x) 2 + n x 2 i=1 n n (x i x) 2 = (xi 2 2x i x + x 2 ) i=1 i=1 n n n = xi 2 2 x x i + x 2 i=1 i=1 i=1 n = xi 2 + n x 2 i=1 KC Border Linear Regression II March 6, 2017 36 / 44
The F test in an ANOVA framework The treatment sum of squares SSTR is defined to be k SSTR = n j (Ȳ j Ȳ ) 2 j=1 It is not hard to show using the fundamental identity and other tricks that (see L&M Theorem 1221, p 598 599) where µ = k j=1 k E(SSTR) = (k 1)σ 2 + n j (µ j µ) 2, (4) j=1 n j n µ j is the overall average of the (unobserved) µ j s That is, a large value of SSTR relative to (k 1)σ 2 indicates that the null hypothesis H 0 : µ 1 = = µ k = µ should be rejected KC Border Linear Regression II March 6, 2017 37 / 44
The F test in an ANOVA framework Estimating σ 2 Start by defining and aggregating s 2 j = nj i=1 (Y ij Ȳ j) 2, n j 1 k k n j SSE = (n j 1)sj 2 = (y ij ȳ j ) 2, (5) j=1 j=1 i=1 which is called the error sum of squares The important fact about these is: SSE σ 2 χ2 (n k) and SSE and SSTR are stochastically independent (L&M, Theorem 1223, p 600) KC Border Linear Regression II March 6, 2017 38 / 44
The F test in an ANOVA framework Under the null H 0 : µ 1 = = µ k = µ, Therefore, under the null, SSTR σ 2 χ 2 (k 1) F = SSTR/(k 1) SSE/(n k) F k 1, n k KC Border Linear Regression II March 6, 2017 39 / 44
The F test in an ANOVA framework The F -test At the α-level of significance, reject H 0 : µ 1 = = µ k if SSTR/(k 1) SSE/(n k) F 1 α,k 1,n k KC Border Linear Regression II March 6, 2017 40 / 44
The F test in an ANOVA framework ANOVA tables The traditional way to present ANOVA data is in the form of a table like this: Source df SS MS F P Treatment k-1 SSTR Error n-k SSE Total n-1 SSTOT SSTR k 1 SSE n k SSTR/(k 1) SSE/(n k) Two more terms: the mean square for treatments is MSTR = SSTR k 1 the mean square for errors is MSE = SSE n k F F k 1,n k KC Border Linear Regression II March 6, 2017 41 / 44
The F test in an ANOVA framework Contrasts A linear combination of the form C = w µ, where 1 w = 0 is called a contrast A typical contrast uses a vector of the form w = (0,, 0, 1, 0,, 0, 1, 0, 0), j j so C = w µ = µ j µ j Then the hypothesis H 0 : C = 0 amounts to H 0 : µ j = µ j This is probably why it is called a contrast KC Border Linear Regression II March 6, 2017 42 / 44
The F test in an ANOVA framework To test a hypothesis that C = 0, we weight the sample means k Ĉ = w j Ȳ j j=1 Then Define E Ĉ = C Var Ĉ = σ2 k SS C = Ĉ 2 kj=1 wj 2 n j j=1 w 2 j n j KC Border Linear Regression II March 6, 2017 43 / 44
The F test in an ANOVA framework F test of a contrast The test statistic F = SS C SSE/(n k) has F -distribution with (1, n k) degrees of freedom The null hypothesis H 0 : w µ = 0 should be rejected if F F 1 α,1,n k KC Border Linear Regression II March 6, 2017 44 / 44