General Linear Model: Statistical Inference

Similar documents
BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083: Linear Models

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Models and Estimation by Least Squares

11 Hypothesis Testing

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Stat 579: Generalized Linear Models and Extensions

Lecture 15. Hypothesis testing in the linear model

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Confidence Intervals, Testing and ANOVA Summary

Introduction to Estimation Methods for Time Series models. Lecture 1

Quick Review on Linear Multiple Regression

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Multivariate Regression

Master s Written Examination

Problem Selected Scores

STAT 540: Data Analysis and Regression

Simple Linear Regression

Lecture 34: Properties of the LSE

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Ch 2: Simple Linear Regression

Simple Linear Regression

The linear model is the most fundamental of all serious statistical models encompassing:

[y i α βx i ] 2 (2) Q = i=1

Least Squares Estimation-Finite-Sample Properties

Simple and Multiple Linear Regression

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Bayesian Linear Regression

Ch 3: Multiple Linear Regression

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Association studies and regression

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Multiple Linear Regression

Statistics 135 Fall 2008 Final Exam

Central Limit Theorem ( 5.3)

Inference for Regression Simple Linear Regression

Lecture 6: Linear models and Gauss-Markov theorem

STAT5044: Regression and Anova. Inyoung Kim

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Chapter 1. Linear Regression with One Predictor Variable

Analysing data: regression and correlation S6 and S7

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Regression and Statistical Inference

STAT 705 Chapter 16: One-way ANOVA

Stat 579: Generalized Linear Models and Extensions

Lectures on Simple Linear Regression Stat 431, Summer 2012

Math 423/533: The Main Theoretical Topics

2.1 Linear regression with matrices

STAT 525 Fall Final exam. Tuesday December 14, 2010

A Bayesian Treatment of Linear Gaussian Regression

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

2. Regression Review

Formal Statement of Simple Linear Regression Model

Foundations of Statistical Inference

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Chapter 12 - Lecture 2 Inferences about regression coefficient

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

Lecture 3: Multiple Regression

FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4. Prof. Mei-Yuan Chen Spring 2008

STAT 3A03 Applied Regression With SAS Fall 2017

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

3. Linear Regression With a Single Regressor

Homoskedasticity. Var (u X) = σ 2. (23)

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

UNIVERSITY OF TORONTO Faculty of Arts and Science

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Lecture 6: Linear Regression

Master s Written Examination - Solution

Masters Comprehensive Examination Department of Statistics, University of Florida

Regression Models - Introduction

Linear models and their mathematical foundations: Simple linear regression

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Matrix Approach to Simple Linear Regression: An Overview

Part 6: Multivariate Normal and Linear Models

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Scatter plot of data from the study. Linear Regression

Appendix A: Review of the General Linear Model

Inference for Regression Inference about the Regression Model and Using the Regression Line

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Inference for the Regression Coefficient

Chapter 14. Linear least squares

Lecture 14 Simple Linear Regression

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

1. The Multivariate Classical Linear Regression Model

3 Multiple Linear Regression

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

MATH11400 Statistics Homepage

LECTURE 5 HYPOTHESIS TESTING

Master s Written Examination

Bias Variance Trade-off

BIOS 6649: Handout Exercise Solution

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multivariate Linear Regression Models

Transcription:

Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least square estimation (Chapter 4), and generalized least square estimation (Chapter 5). In discussing LS or GLS estimators, we have not made any probabilistic inference mainly because we have not assigned any probability distribution to our lin- 191

ear model structure. Statistical models very often demand more than just estimating the parameters. In particular, one is usually interested in putting a measure of uncertainty in terms of confidence levels or in testing whether some linear functions of the parameters such as difference between two treatment effects is significant or not. As we know from our statistical methods courses that interval estimation or hypothesis testing almost always require a probability model and the inference depends on the particular model you chose. The most common probability model used in statistical inference is the normal model. We will start with the Gauss-Markov model, namely, Y = Xβ + ɛ, (6.1.1) where Assumption I. E(ɛ) =0, Assumption II. cov(ɛ) =σ 2 I, and introduce the assumption of normality to the error components. In other words, Assumption IV. ɛ N(0,σ 2 I n ). As you can see, Assumption IV incorporates Assumptions I and II. The model (6.1.1) along with assumption IV will be referred to as normal theory Chapter 6 192

linear model. Chapter 6 193

6.2 Maximum likelihood, sufficiency and UMVUE Under the normal theory linear model described in previous section, the likelihood function for the parameter β and σ 2 can be written as { } n { 1 L(β,σ 2 Y, X) = σ exp (Y } Xβ)T (Y Xβ) (6.2.1) 2π σ { } 2 n { 1 = σ exp YT Y 2β T X T Y + β T X T } Xβ), 2π σ 2 (6.2.2) Since X is known, (6.2.2) implies that L(β,σ 2 Y, X) belongs to an exponential family with a joint complete sufficient statistic (Y T Y, X T Y)for (β,σ 2 ). Also, (6.2.1) shows that the likelihood is maximized for β for given σ 2 when Y Xβ 2 =(Y Xβ) T (Y Xβ) (6.2.3) is minimized. But, when is (6.2.3) minimized? From chapter 4, we know that (6.2.3) is minimized when β is a solution to the normal equations X T Xβ = X T Y. (6.2.4) Thus maximum likelihood estimator (MLE) of β also satisfies the normal equations. By the invariance property of maximum likelihood, MLE of an estimable function λ T β is given by λ T ˆβ, whereˆβ is a solution to the normal equations (6.2.4). Using the maximum likelihood estimator ˆβ of β, we Chapter 6 194

express the likelihood (6.2.1) as a function of σ 2 only, namely, { } { } n 1 L(σ 2 Y, X) = σ exp (Y Xˆβ) T (Y Xˆβ) 2π σ 2 (6.2.5) with corresponding log-likelihood lnl(σ 2 Y, X) = n 2 ln(2πσ2 ) (Y Xˆβ) T (Y Xˆβ) σ 2, (6.2.6) which is maximized for σ 2 when ˆσ MLE 2 = (Y Xˆβ) T (Y Xˆβ). (6.2.7) n Note that while the MLE of β is identical to the LS estimator, ˆσ 2 MLE is not the same as the LS estimator of σ 2. Proposition 6.2.1. MLE of σ 2 is biased. Proposition 6.2.2. The MLE λ T ˆβ is the uniformly minimum variance unbiased estimator (UMVUE) of λ T β. Proposition 6.2.3. The MLE λ T ˆβ and nˆσ 2 MLE /σ 2 are independently distributed, respectively as N(λ T β,σ 2 λ T (X T X) g λ) and χ 2 n r. Chapter 6 195

6.3 Confidence interval for an estimable function Proposition 6.3.1. The quantity λ T ˆβ λ T β ˆσ 2 LS λ T (X T X) g λ has a t distribution with n r df. The proposition 6.3.1 leads the way for us to construct a confidence interval (CI) for the estimable function λ T β. In fact, a 100(1 α)% CI for λ T β is given as λ T ˆβ ± tn r,α/2 ˆσ LS 2 λt (X T X) g λ. (6.3.1) This confidence interval is in the familiar form Estimate ± t α/2 SE. Chapter 6 196

Example 6.3.2. Consider the simple linear regression model considered in Example 1.1.3 given by Equation (1.1.7) where Y i = β 0 + β 1 w i + ɛ i, (6.3.2) where Y i and w i respectively represents the survival time and the age at prognosis for the ith patient. We have identified in Example 4.2.3 that a unique LS estimator for β 0 and β 1 is given by ˆβ 1 = (wi w)(y i Ȳ ) (wi w) 2 ˆβ 0 = Ȳ ˆβ 1 w, (6.3.3) provided (w i w) 2 > 0. For a given patient who was diagnosed with leukemia at the age of w 0, one would be interested in predicting the length of survival. The LS estimator of the expected survival for this patient is given by Ŷ 0 = ˆβ 0 + ˆβ 1 w 0. (6.3.4) What is a 95% confidence interval for Ŷ0? Note that Ŷ0 is in the form of λ T ˆβ where ˆβ1 =(ˆβ 0, ˆβ 1 ) T,andλ =(1,w 0 ) T. The variance-covariance matrix of ˆβ is given by cov(ˆβ) = (X T X) 1 σ 2 = σ 2 n (w i w) 2 w 2 i w i w i n. (6.3.5) Chapter 6 197

Thus the variance of the predictor Ŷ0 is given by var(ŷ0) = var(λ T ˆβ) = λ T cov(ˆβ)λ σ 2 w 2 = n (w i w) 2(1,w 0) i w i 1 w i n w 0 σ 2 w 2 = n (w i w) 2(1,w 0) i w 0 wi w i + nw 0 σ 2 [ = n ] w 2 (w i w) 2 i 2w 0 wi + nw0 2 = σ2 (w i w 0 ) 2 n (w i w) 2. (6.3.6) Therefore, the standard error of Ŷ0 is obtained as: (wi w 0 ) SE(Ŷ0) =ˆσ 2 LS n (w i w) 2, (6.3.7) where ˆσ 2 LS = n i=1 (Y i ˆβ 0 ˆβ 1 w i ) 2 /(n 2). Using (6.3.1), a 95% CI for Ŷ0 is then (wi w 0 ) Ŷ 0 ± t 2 n 2,0.025ˆσ LS n (w i w) 2. (6.3.8) Chapter 6 198

6.4 Test of Hypothesis Very often we are interested in testing hypothesis related to some linear function of the parameters in a linear model. From our discussions in Chapter 4, we have learned that not all linear functions of the parameter vector can be estimated. Similarly, not all hypothesis corresponding to linear functions of the parameter vector can be tested. We will know shortly, which hypothesis can be tested and which cannot. Let us first look at our favorite one-way- ANOVA model. Usually, the hypotheses of interest are: 1. Equality of a treatment effects: H 0 : α 1 = α 2 =...= α a. (6.4.1) 2. Equality of two specific treatment effects, e.g., H 0 : α 1 = α 2, (6.4.2) 3. A linear combination such as a contrast (to be discussed later) of treatment effects H 0 : c i α i =0, (6.4.3) where c i are known constants such that c i =0. Chapter 6 199

Note that, if we consider the normal theory Gauss-Markov linear model Y = Xβ + ɛ, then all of the above hypotheses can be written as A T β = b, (6.4.4) where A is a p q matrix with rank(a) =q. If A is not of full column rank, we exclude the redundant columns from A to have a full column rank matrix. Now, how would you proceed to test the hypothesis (6.4.4)? Let us deviate a little and refresh our memory about test of hypothesis. If Y 1,Y 2,...,Y n are n iid observations from N(μ, σ 2 ) population, how do we construct a test statistic to test the hypothesis H 0 : μ = μ 0? (6.4.5) Chapter 6 200

Thus we took 3 major steps in constructing a test statistic: Estimated the parametric function (μ), Found the distribution of the estimate, and Eliminated the nuisance parameter. We will basically follow the same procedure to test the hypothesis (6.4.4). First we need to estimate A T β. That would mean that A T β must be estimable. Definition 6.4.1. Testable hypothesis. A linear hypothesis A T β is testable if the rows of A T β are estimable. In other words (see chapter 4), there exists a matrix C such that A = X T C. (6.4.6) The assumption that A has full column rank is just a matter of convenience. Given a set of equations A T β = b, one can easily eliminate redundant equations to transform it into a system of equations with a full column rank matrix. Since A T β is estimable, the corresponding LS estimator of A T β is given by A T ˆβ = A T (X T X) g X T Y, (6.4.7) Chapter 6 201

which is a linear function of Y. Under assumption IV, A T ˆβ is distributed as A T ˆβ Nq (A T β,σ 2 A T (X T X) g A = σ 2 Σ β ), (6.4.8) where we introduced the notation A T (X T X) g A = B. Therefore, (A T ˆβ A T β) T B 1 (A T ˆβ A T β) σ 2 χ 2 q, (6.4.9) or under the null hypothesis H 0 : A T β = b, (A T ˆβ b) T B 1 (A T ˆβ b) σ 2 χ 2 q. (6.4.10) Hadweknownσ 2 this test statistic could have been used to test the hypothesis H 0 : A T β = b. But since σ 2 is unknown, the left hand side of equation (6.4.10) cannot be used as a test statistic. In order to get rid of σ 2 from the test statistic, we note that RSS σ 2 = YT (I P)Y σ 2 = (Y Xˆβ) T (Y Xˆβ) σ 2 χ 2 n r. (6.4.11) If we could show that the statistics in (6.4.10) and (6.4.11) are independent, then we could construct a F-statistic as a ratio of two mean-chi-squares as follows: F = (A T ˆβ b) T B 1 (A T ˆβ b) σ 2 / q RSS σ 2 / (n r) F q,n r under H 0. (6.4.12) Chapter 6 202

Notice that the nuisance parameter σ 2 is canceled out and we obtain the F -statistic F = (AT ˆβ b) T B 1 (A T ˆβ b) qˆσ 2 LS F q,n r under H 0, (6.4.13) where ˆσ 2 LS, the residual mean square, is the LS estimator of σ2. Let us denote the numerator of the F-statistic by Q. Independence of Q and ˆσ 2 LS First note that A T ˆβ b = C T Xˆβ b = C T X(X T X) g X T Y b = C T PY A T A(A T A) 1 b = C T PY C T XA(A T A) 1 b = C T PY C T PXA(A T A) 1 b = C T P(Y b ), (6.4.14) where b = XA(A T A) 1 b.now, Q = (A T ˆβ b) T B 1 (A T ˆβ b) = (Y b ) T PCB 1 C T P(Y b ) = (Y b ) T A (Y b ), (6.4.15) Chapter 6 203

where A = PCB 1 C T P. On the other hand, RSS = (n r)ˆσ 2 LS = Y T (I P)Y = (Y b + b ) T (I P)(Y b + b ) = (Y b ) T (I P)(Y b ).(Why?) (6.4.16) Thus, both Q and RSS are quadratic forms in Y b, which under assumption IV is distributed as N(Xβ b,σ 2 I n ). Therefore, SSE and Q are independently distributed if and only if A (I P) = 0, which follows immediately. In summary, the linear testable hypothesis H 0 : A T β = b can be tested by the F-statistic F = (AT ˆβ b) T B 1 (A T ˆβ b) qˆσ 2 LS (6.4.17) by comparing it to the critical values from a F q,n r distribution. Example 6.4.1. Modeling weight loss as a function of initial weight. Suppose n individuals, m men and w women participated in a six-week weight-loss program. At the end of the program, investigators used a linear model with the reduction in weight as the outcome and the initial weight as the explanatory variable. They started with separate intercept and slopes but wanted to see if the rate of decline is similar between men and women. Chapter 6 204

Consider the linear model: α m + β m W 0i + ɛ i, if the ith individual is a male R i = α w + β f W 0i + ɛ i, if the ith individual is a female (6.4.18) Theideaistotestthehypothesis H 0 : β m = β f. (6.4.19) If we write β =(α m,β m,α f,β f ) T,thenH 0 can be written as H 0 : A T β = b, (6.4.20) where A =(010 1), b =0. (6.4.21) Assuming (for simplicity) that the first m individuals are male (i =1, 2,...,m) and the rest (i = m +1,m+2,...,m+ w = n) are female, the linear model for this problem can be written as Y = Xβ + ɛ, where Y =(R 1,R 2,...,R n ),and X = 1 m W m 0 m 0 m 0 f 0 f 1 f W f, (6.4.22) Chapter 6 205

where W m =(w 01,w 02,...,w 0m ) T and W f =(w 0(m+1),w 0(m+2),...,w 0n ) T. Obviously, X T X = B m 0 2 2 0 2 2 B f, (6.4.23) where B m = m m i=1 w 0i m i=1 w 0i m i=1 w2 0i, (6.4.24) and B m = m n i=m+1 w 0i n i=m+1 w 0i n i=m+1 w2 0i. (6.4.25) where B 1 m = (X T X) 1 = 1 mss m (W ) B 1 m 0 2 2, (6.4.26) 0 2 2 B 1 f m i=1 w2 0i m i=1 w 0i m i=1 w 0i m, (6.4.27) where SS m (W )= m i=1 (w 0i W m ) 2 with W m = m i=1 w 0i/m. Similarly, B 1 f = 1 mss f (W ) n i=m+1 w2 0i n i=m+1 w 0i n i=m+1 w 0i m, (6.4.28) where SS f (W )= n i=m+1 (w 0i W f ) 2 with W f = n i=m+1 w 0i/w. This leads Chapter 6 206

to the usual LS estimator ˆβ = R m ˆβ m Wm SP m /SS m (W ) R f ˆβ f Wf SP f /SS f (W ) (6.4.29) where, SP m = m i=1 (R i R m )(w 0i W m ), and similarly, SP f = n i=m+1 (R i R f )(w 0i W f ). R m ˆβ m Wm SP A T ˆβ = (0 1 0 1) m /SS m (W ) R f ˆβ f Wf SP f /SS f (W ) = SP m /SS m (W ) SP f /SS f (W ). (6.4.30) B = A T (X T X) 1 A 0 = (0 1 0 1) B 1 m 0 2 2 1 0 2 2 B 1 f 0 1 = SS m(w )+SS f (W ) SS m (W ) SS f (W ). (6.4.31) Chapter 6 207

Hence, Q = (A T ˆβ b) T B 1 (A T ˆβ b) = SS m(w ) SS f (W ) SS m (W )+SS f (W ) (SP m/ss m (W ) SP f /SS f (W )) 2 (6.4.32) The residual sum of squares for this model is RSS = m (R i ˆα m β ˆ m w 0i ) 2 + i=1 n i=m+1 (R i ˆα f ˆβ f w 0i ) 2 = SS m (R) ˆβ 2 mss m (W )+SS f (R) ˆβ 2 fss f (W ), (6.4.33) where SS f (R) = n i=m+1 (R i R f ) 2 with R f = n i=m+1 R i/w, and similarly for SS m (R). Corresponding F-statistic for testing H 0 : β m = β f is then given by F = Q RSS/(n 4). (6.4.34) Chapter 6 208

6.4.1 Alternative motivation and derivation of F-test Suppose we want to test the testable hypothesis A T β = b in the linear model Y = Xβ + ɛ (6.4.35) under assumption IV. Under the null hypothesis, the model (6.4.35) can be treated as a restricted model with the jointly estimable restrictions imposed by the null hypothesis. Now whether this hypothesis is true can be tested by the extra error sum of squares due to the restriction (null hypothesis). By the theory developed in Chapter 4, RSS H = RSS +(A T ˆβ b) T B 1 (A T ˆβ b) =RSS + Q (6.4.36) leading to RSS H RSS = Q. If the null hypothesis is false, one would expect this difference to be large. But how large? We compare this extra error SS to the RSS from the original model. In other words, under the alternative, we expect Q/RSS to be large, or equivalently, F =(n r)q/(q RSS) tobe large. The significance of this test can be calculated by noting the fact that under H 0, F follows a F -distribution with q and n r d.f. Chapter 6 209

Example 6.4.2. Example 6.4.1 continued. Under the null hypothesis, the model can be written as α m + βw 0i + ɛ i, if the ith individual is a male R i = α w + βw 0i + ɛ i, if the ith individual is a female (6.4.37) Notice the same slope for both subgroup. The parameter estimates for this problem are: ˆα m ˆα wˆβ = R m ˆβ W m R f ˆβ W f ˆβ ˆβ SP m +SP f SS m (W )+SS f (W ) = m SS m (W )+ f SS f (W ) SS m (W )+SS f (W ). (6.4.38) It can be shown that the residual sum of squares for this reduced model can be written as RSS H = SS m (R)+(ˆβ 2 2ˆβ ˆβ m )SS m (W )+SS f (R) ( ˆβ 2 2ˆβ ˆβ f )SS f (W ), leading to (6.4.39) RSS H RSS =(ˆβ m ˆβ) 2 SS m (W )+(ˆβ f ˆβ) 2 SS f (W ). (6.4.40) The test statistic F is then given by, { (n 4) ( ˆβ m ˆβ) 2 SS m (W )+(ˆβ f ˆβ) } 2 SS f (W ) F = SS m (R) ˆβ m 2 SS m(w )+SS f (R) ˆβ f 2SS f(w ) (6.4.41) Chapter 6 210

6.5 Non-testable Hypothesis In the previous section, we only considered hypothesis which are testable. What happens if you start with a non-testable hypothesis? A non-testable hypothesis H 0 : A T β = b is one for which every row of the linear function A T β are non-estimable and no linear combinations of them are estimable. From our discussion in Chapter 4, we know that when we impose non-estimable restrictions on the linear model, the restricted solution becomes yet another solution to the linear model and hence the residual sum of squares does not change. Thus, in this case, extra error sum of squares due to the non-testable hypothesis is identically equal to zero. However, one may still calculate a F - statistic using the formula (6.4.13) F = (AT ˆβ b) T B 1 (A T ˆβ b) qˆσ LS 2, (6.5.1) as long as B is invertible. What would this statistic test if H 0 : A T β = b is non-testable? In fact, when A T β is not estimable, the test statistic F would actually test the hypothesis A T (X T X) g X T Xβ = b. [See Problem 6 in this section] Chapter 6 211

Partially Testable Hypothesis Suppose you want to test the hypothesis A T β = b, A =(A p q 1 1 A p q 2 2 ); q 1 + q 2 = q, and b =(b T 1 bt 2 )T (6.5.2) such that A has rank q and A T 1 β is estimable while AT 2 β is not. From our discussion on the testable and non-testable hypotheses, we can easily recognize that if we use the F-statistic based on RSS H RSS, we will end up with a test statistic which will test the hypothesis H 0 : A T 1 β = b 1. However, if we use the test statistic (6.4.13), we will be testing the hypothesis A T β = b, (6.5.3) where A =(A 1 H T A 2 ) with H =(X T X) g X T X. Chapter 6 212

6.6 Problems 1. Consider the simple regression model: Y i = μ + βx i + ɛ i,i=1, 2,...,n, (6.6.1) where ɛ i,i=1, 2,...,n,are iid normal random variables with mean zero and variance σ 2. (a) For a given value x 0 of x, obtain the least square estimator of θ(x 0 )= μ + βx 0. (b) Derive a test statistic for testing the hypothesis H 0 : θ(x 0 )=θ 0, where θ 0 is a known constant. 2. Suppose θ = a T β is estimable in the model Y = Xβ + ɛ, wherea and β are p 1 vectors, Y is an n 1 random vector, X is an n p matrix with rank(x) =r p, andɛ N(0,σ 2 I n ). Assume ˆβ and S 2 to be the least squares estimators of β and σ 2 respectively. (a) What is the least squares estimator for a T β?callitˆθ. (b) What is the distribution of ˆθ? (c) What is the distribution of (n r)s 2 /σ 2? (d) Show that ˆθ and (n r)s 2 are independently distributed. (e) Show how one can construct a 95% confidence interval for θ. Chapter 6 213

(While answering parts (b) and (c), do not forget to mention the specific parameters of the corresponding distributions.) 3. Suppose Y 11,Y 12,...,Y 1ni be n 1 independent observations from a N(μ + α 1,σ 2 ) distribution and Y 21,Y 22,...,Y 2n2 be n 2 independent observations from a N(μ α 2,σ 2 ) distribution. Notice that the two populations have different means but the same standard deviation. Assume that Y 1j and Y 2j are independent for all j. Define n 1 + n 2 = n, andy = (Y 11,Y 12,...,Y 1n1,Y 21,Y 22,...,Y 2n2 ) T as the n 1 vector consisting of all n observations. We write Y as Y = Xβ + ɛ, (6.6.2) where β =(μ, α 1,α 2 ) T, X = 1 n 1 1 n1 0 1 n2 0 1 n2 with 1 ni representing a n i 1 vector of 1 s, and ɛ is an n 1 vector distributed as N(0,σ 2 I n ). (a) A particular solution to the normal equations for estimating β is given by ˆβ = GX T Y,whereG = diag(0, 1/n 1, 1/n 2 ). Show that the error sums of squares for this model is given by SSE =(Y X ˆβ) T (Y X ˆβ) = 2 ni ( i=1 j=1 Yij Ȳi.) 2. (b) Suppose one wants to test the hypothesis H 0 : α 1 = α 2. The restricted model under the null hypothesis is given by Y = X H β H + ɛ, (6.6.3) Chapter 6 214

where X H = 1 n 1 1 n1 1 n2 1 n2, and β H =(μ, α 1 ) T. The least squares estimator of β H is given by ˆβ H = 1 Ȳ1. + Ȳ2. 2. Find the error Ȳ 1. Ȳ2. sum of squares for the restricted model. Call it SSE H. Compare SSE and SSE H. Are they equal? Why or why not? (c) Is the hypothesis H 1 : α 1 + α 2 = 0 testable? If so, give a F -statistic to test this hypothesis. If not, state the reason why it is not testable. 4. Suppose two objects, say A 1 and A 2 with unknown weights β 1 and β 2 respectively are measured on a balance using the following scheme, all of these actions being repeated twice: both objects on the balance, resulting in weights Y 11 and Y 12, Only A 1 on the balance, resulting in weights Y 21 and Y 22,and Only A 2 on the balance, resulting in weights Y 31 and Y 32. Assume Y ij s are independent, normally distributed variables with common variance σ 2. Also assume that the balance has an unknown systematic (fixed) error β 0. The model can then be written as: Y 11 = Y 12 = β 0 + β 1 + β 2 + ɛ 1 Y 21 = Y 22 = β 0 + β 1 + ɛ 2 Y 31 = Y 32 = β 0 + β 2 + ɛ 3, Chapter 6 215

where ɛ i s are i.i.d. random variables. Let Ȳi. =(Y i1 + Y i2 )/2. (a) Express the least square estimates ˆβ of β =(β 0,β 1,β 2 ) T in terms of Ȳ i.,i=1, 2, 3. (b) Find Var(ˆβ). Argue that ˆβ has a normal distribution. (c) Suppose ˆσ 2 = 3 2 i=1 j=1 (Y ij Ȳi.) 2 /3. What is the distribution of ˆσ 2?. (d) Write the hypothesis that both objects have the same weight in the form H 0 : A T β = b. Identify C and d. (e) Propose an F-statistic for testing the above hypothesis. 5. A clinical study recruited 150 untreated hepatitis C patients along with 75 hepatitis C patients who have been treated previously with Interferonα monotherapy. There are two new drugs under study, pegintron and peginterferon. Investigators randomly allocated the two treatments to all 225 patients. The goal is to compare the effect of Pegintron and Peginterferon in reducing the hepatitis C. The response variable Y is the difference in viral levels between baseline and 24 weeks post treatment. Denote by α P and α PI the effect of pegintron and peginterferon respectively. (a) Write this problem as a general linear model assuming that the new Chapter 6 216

treatments may interact with whether the patient was previously treated or not. Use your own notation for effects, subscripts and error terms not defined here. (b) What is the design matrix for the model you specified in part (a)? (c) Is α PI α P estimable in the linear model you have proposed in part (a)? i. If yes, give its BLUE and a 100(1 α)% confidence interval for the difference between α PI and α P. ii. If not, give an estimable function that would represent the difference between the two new treatments and give a 100(1 α)% confidence interval for this estimable function. 6. Consider the linear model Y = Xβ + ɛ, where ɛn n (0,σ 2 I n ), X is a n p matrix of rank r(< p)andβ is of order p 1. (a) Let H =(X T X) g X T X. Show that A T Hβ is estimable for any p q matrix A. (b) For a matrix A of order p q of rank q, thetestable hypothesis H 0 : A T β = b can be tested using the test statistic F = (n r) q Q SSE, Chapter 6 217

where Q =(A T ˆβ b) T [A T (X T X) g A] 1 (A T ˆβ b) (6.6.4) which is identical to SSE H SSE. Here, ˆβ =(X T X) g X T y is a solution to the normal equations X T Xβ = X T Y.WhenA T β = b is non-testable, SSE H SSE is identically equal to zero. However, it is still possible to calculate Q using expression (6.6.4) if A T (X T X) g A is invertible. Now suppose that A T β = b is non-testable but A T (X T X) g A is invertible. Derive Q for testing the hypothesis H 0 : A T Hβ = b and show that it is algebraically identical to the test statistic Q that would be calculated for testing the non-testable hypothesis H 0 : A T β = b using (6.6.4). 7. Consider the one-way-anova model Y ij = μ + α i + ɛ ij ; i =1, 2, 3; j =1, 2,...n i. (a) Develop a test procedure for testing the hypotheseis H 0 : A T β =0 where A = 0 0 1 1 1 0 0 1. Chapter 6 218

(b) Develop a test procedure for testing the hypothesis H 0 : A T β =0 where A = 0 0 1 1 1 1 0 2. (c) Compare the test statistics from part (a) and (b). Are they same? If so, why? (d) If possible, derive a test procedure for testing the hypothesis H 0 : α 1 = α 2 =5. 8. For testing the hypothesis H 0 : A T β = b, the test statistic derived in class was F = Q/q SSE/(n r) = n r Q q SSE. (a) If U χ 2 n (λ) distribution, what is E(U)? (b) If V χ 2 n distribution, what is E(1/V )? (c) Use the independence of Q and SSE, and your results from parts (a)and(b)tofinde(f). (d) Show that E(F ) 1. Chapter 6 219

9. Consider the multiple regression model Y i = β 1 X 1i + β 2 X 2i +...+ β p X pi + ɛ i,i=1, 2,...,n. with E(ɛ i )=0andvar(ɛ i )=σ 2. Assume ɛ i s are independent. (a) Derive a test statistic for testing H 0 : β (1) =0,whereβ (1) includes only the first q(< p)componentsofβ. (b) How would you test the hypothesis β j = β j0? 10. Suppose (Y i,x i ),i=1, 2,...,n be a pair of n independent observations such that the conditional distribution of Y i X i is normal with mean μ 1 + ρσ 1 σ 2 (1 ρ) (X i μ 2 ) and variance σ 2 1 (σ2 1,σ2 2 > 0, ρ < 1). (a) Can you write the problem as a linear model problem? [Hint: Use transformed parameters, for example, define β 0 = μ 1 ρσ 1μ 2 σ 2 (1 ρ) ]. (b) Is the hypothesis μ 1 = 0 testable? (c) Is the hypothesis ρ = 0 linear in terms of your transformed parameters? If so, is it testable? (d) For part (b) and (c), if possible, provide the F -statistic for testing the above hypotheses. Chapter 6 220