General Linear Model: Statistical Inference

Size: px
Start display at page:

Download "General Linear Model: Statistical Inference"

Transcription

1 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least square estimation (Chapter 4), and generalized least square estimation (Chapter 5). In discussing LS or GLS estimators, we have not made any probabilistic inference mainly because we have not assigned any probability distribution to our lin- 191

2 ear model structure. Statistical models very often demand more than just estimating the parameters. In particular, one is usually interested in putting a measure of uncertainty in terms of confidence levels or in testing whether some linear functions of the parameters such as difference between two treatment effects is significant or not. As we know from our statistical methods courses that interval estimation or hypothesis testing almost always require a probability model and the inference depends on the particular model you chose. The most common probability model used in statistical inference is the normal model. We will start with the Gauss-Markov model, namely, Y = Xβ + ɛ, (6.1.1) where Assumption I. E(ɛ) =0, Assumption II. cov(ɛ) =σ 2 I, and introduce the assumption of normality to the error components. In other words, Assumption IV. ɛ N(0,σ 2 I n ). As you can see, Assumption IV incorporates Assumptions I and II. The model (6.1.1) along with assumption IV will be referred to as normal theory Chapter 6 192

3 linear model. Chapter 6 193

4 6.2 Maximum likelihood, sufficiency and UMVUE Under the normal theory linear model described in previous section, the likelihood function for the parameter β and σ 2 can be written as { } n { 1 L(β,σ 2 Y, X) = σ exp (Y } Xβ)T (Y Xβ) (6.2.1) 2π σ { } 2 n { 1 = σ exp YT Y 2β T X T Y + β T X T } Xβ), 2π σ 2 (6.2.2) Since X is known, (6.2.2) implies that L(β,σ 2 Y, X) belongs to an exponential family with a joint complete sufficient statistic (Y T Y, X T Y)for (β,σ 2 ). Also, (6.2.1) shows that the likelihood is maximized for β for given σ 2 when Y Xβ 2 =(Y Xβ) T (Y Xβ) (6.2.3) is minimized. But, when is (6.2.3) minimized? From chapter 4, we know that (6.2.3) is minimized when β is a solution to the normal equations X T Xβ = X T Y. (6.2.4) Thus maximum likelihood estimator (MLE) of β also satisfies the normal equations. By the invariance property of maximum likelihood, MLE of an estimable function λ T β is given by λ T ˆβ, whereˆβ is a solution to the normal equations (6.2.4). Using the maximum likelihood estimator ˆβ of β, we Chapter 6 194

5 express the likelihood (6.2.1) as a function of σ 2 only, namely, { } { } n 1 L(σ 2 Y, X) = σ exp (Y Xˆβ) T (Y Xˆβ) 2π σ 2 (6.2.5) with corresponding log-likelihood lnl(σ 2 Y, X) = n 2 ln(2πσ2 ) (Y Xˆβ) T (Y Xˆβ) σ 2, (6.2.6) which is maximized for σ 2 when ˆσ MLE 2 = (Y Xˆβ) T (Y Xˆβ). (6.2.7) n Note that while the MLE of β is identical to the LS estimator, ˆσ 2 MLE is not the same as the LS estimator of σ 2. Proposition MLE of σ 2 is biased. Proposition The MLE λ T ˆβ is the uniformly minimum variance unbiased estimator (UMVUE) of λ T β. Proposition The MLE λ T ˆβ and nˆσ 2 MLE /σ 2 are independently distributed, respectively as N(λ T β,σ 2 λ T (X T X) g λ) and χ 2 n r. Chapter 6 195

6 6.3 Confidence interval for an estimable function Proposition The quantity λ T ˆβ λ T β ˆσ 2 LS λ T (X T X) g λ has a t distribution with n r df. The proposition leads the way for us to construct a confidence interval (CI) for the estimable function λ T β. In fact, a 100(1 α)% CI for λ T β is given as λ T ˆβ ± tn r,α/2 ˆσ LS 2 λt (X T X) g λ. (6.3.1) This confidence interval is in the familiar form Estimate ± t α/2 SE. Chapter 6 196

7 Example Consider the simple linear regression model considered in Example given by Equation (1.1.7) where Y i = β 0 + β 1 w i + ɛ i, (6.3.2) where Y i and w i respectively represents the survival time and the age at prognosis for the ith patient. We have identified in Example that a unique LS estimator for β 0 and β 1 is given by ˆβ 1 = (wi w)(y i Ȳ ) (wi w) 2 ˆβ 0 = Ȳ ˆβ 1 w, (6.3.3) provided (w i w) 2 > 0. For a given patient who was diagnosed with leukemia at the age of w 0, one would be interested in predicting the length of survival. The LS estimator of the expected survival for this patient is given by Ŷ 0 = ˆβ 0 + ˆβ 1 w 0. (6.3.4) What is a 95% confidence interval for Ŷ0? Note that Ŷ0 is in the form of λ T ˆβ where ˆβ1 =(ˆβ 0, ˆβ 1 ) T,andλ =(1,w 0 ) T. The variance-covariance matrix of ˆβ is given by cov(ˆβ) = (X T X) 1 σ 2 = σ 2 n (w i w) 2 w 2 i w i w i n. (6.3.5) Chapter 6 197

8 Thus the variance of the predictor Ŷ0 is given by var(ŷ0) = var(λ T ˆβ) = λ T cov(ˆβ)λ σ 2 w 2 = n (w i w) 2(1,w 0) i w i 1 w i n w 0 σ 2 w 2 = n (w i w) 2(1,w 0) i w 0 wi w i + nw 0 σ 2 [ = n ] w 2 (w i w) 2 i 2w 0 wi + nw0 2 = σ2 (w i w 0 ) 2 n (w i w) 2. (6.3.6) Therefore, the standard error of Ŷ0 is obtained as: (wi w 0 ) SE(Ŷ0) =ˆσ 2 LS n (w i w) 2, (6.3.7) where ˆσ 2 LS = n i=1 (Y i ˆβ 0 ˆβ 1 w i ) 2 /(n 2). Using (6.3.1), a 95% CI for Ŷ0 is then (wi w 0 ) Ŷ 0 ± t 2 n 2,0.025ˆσ LS n (w i w) 2. (6.3.8) Chapter 6 198

9 6.4 Test of Hypothesis Very often we are interested in testing hypothesis related to some linear function of the parameters in a linear model. From our discussions in Chapter 4, we have learned that not all linear functions of the parameter vector can be estimated. Similarly, not all hypothesis corresponding to linear functions of the parameter vector can be tested. We will know shortly, which hypothesis can be tested and which cannot. Let us first look at our favorite one-way- ANOVA model. Usually, the hypotheses of interest are: 1. Equality of a treatment effects: H 0 : α 1 = α 2 =...= α a. (6.4.1) 2. Equality of two specific treatment effects, e.g., H 0 : α 1 = α 2, (6.4.2) 3. A linear combination such as a contrast (to be discussed later) of treatment effects H 0 : c i α i =0, (6.4.3) where c i are known constants such that c i =0. Chapter 6 199

10 Note that, if we consider the normal theory Gauss-Markov linear model Y = Xβ + ɛ, then all of the above hypotheses can be written as A T β = b, (6.4.4) where A is a p q matrix with rank(a) =q. If A is not of full column rank, we exclude the redundant columns from A to have a full column rank matrix. Now, how would you proceed to test the hypothesis (6.4.4)? Let us deviate a little and refresh our memory about test of hypothesis. If Y 1,Y 2,...,Y n are n iid observations from N(μ, σ 2 ) population, how do we construct a test statistic to test the hypothesis H 0 : μ = μ 0? (6.4.5) Chapter 6 200

11 Thus we took 3 major steps in constructing a test statistic: Estimated the parametric function (μ), Found the distribution of the estimate, and Eliminated the nuisance parameter. We will basically follow the same procedure to test the hypothesis (6.4.4). First we need to estimate A T β. That would mean that A T β must be estimable. Definition Testable hypothesis. A linear hypothesis A T β is testable if the rows of A T β are estimable. In other words (see chapter 4), there exists a matrix C such that A = X T C. (6.4.6) The assumption that A has full column rank is just a matter of convenience. Given a set of equations A T β = b, one can easily eliminate redundant equations to transform it into a system of equations with a full column rank matrix. Since A T β is estimable, the corresponding LS estimator of A T β is given by A T ˆβ = A T (X T X) g X T Y, (6.4.7) Chapter 6 201

12 which is a linear function of Y. Under assumption IV, A T ˆβ is distributed as A T ˆβ Nq (A T β,σ 2 A T (X T X) g A = σ 2 Σ β ), (6.4.8) where we introduced the notation A T (X T X) g A = B. Therefore, (A T ˆβ A T β) T B 1 (A T ˆβ A T β) σ 2 χ 2 q, (6.4.9) or under the null hypothesis H 0 : A T β = b, (A T ˆβ b) T B 1 (A T ˆβ b) σ 2 χ 2 q. (6.4.10) Hadweknownσ 2 this test statistic could have been used to test the hypothesis H 0 : A T β = b. But since σ 2 is unknown, the left hand side of equation (6.4.10) cannot be used as a test statistic. In order to get rid of σ 2 from the test statistic, we note that RSS σ 2 = YT (I P)Y σ 2 = (Y Xˆβ) T (Y Xˆβ) σ 2 χ 2 n r. (6.4.11) If we could show that the statistics in (6.4.10) and (6.4.11) are independent, then we could construct a F-statistic as a ratio of two mean-chi-squares as follows: F = (A T ˆβ b) T B 1 (A T ˆβ b) σ 2 / q RSS σ 2 / (n r) F q,n r under H 0. (6.4.12) Chapter 6 202

13 Notice that the nuisance parameter σ 2 is canceled out and we obtain the F -statistic F = (AT ˆβ b) T B 1 (A T ˆβ b) qˆσ 2 LS F q,n r under H 0, (6.4.13) where ˆσ 2 LS, the residual mean square, is the LS estimator of σ2. Let us denote the numerator of the F-statistic by Q. Independence of Q and ˆσ 2 LS First note that A T ˆβ b = C T Xˆβ b = C T X(X T X) g X T Y b = C T PY A T A(A T A) 1 b = C T PY C T XA(A T A) 1 b = C T PY C T PXA(A T A) 1 b = C T P(Y b ), (6.4.14) where b = XA(A T A) 1 b.now, Q = (A T ˆβ b) T B 1 (A T ˆβ b) = (Y b ) T PCB 1 C T P(Y b ) = (Y b ) T A (Y b ), (6.4.15) Chapter 6 203

14 where A = PCB 1 C T P. On the other hand, RSS = (n r)ˆσ 2 LS = Y T (I P)Y = (Y b + b ) T (I P)(Y b + b ) = (Y b ) T (I P)(Y b ).(Why?) (6.4.16) Thus, both Q and RSS are quadratic forms in Y b, which under assumption IV is distributed as N(Xβ b,σ 2 I n ). Therefore, SSE and Q are independently distributed if and only if A (I P) = 0, which follows immediately. In summary, the linear testable hypothesis H 0 : A T β = b can be tested by the F-statistic F = (AT ˆβ b) T B 1 (A T ˆβ b) qˆσ 2 LS (6.4.17) by comparing it to the critical values from a F q,n r distribution. Example Modeling weight loss as a function of initial weight. Suppose n individuals, m men and w women participated in a six-week weight-loss program. At the end of the program, investigators used a linear model with the reduction in weight as the outcome and the initial weight as the explanatory variable. They started with separate intercept and slopes but wanted to see if the rate of decline is similar between men and women. Chapter 6 204

15 Consider the linear model: α m + β m W 0i + ɛ i, if the ith individual is a male R i = α w + β f W 0i + ɛ i, if the ith individual is a female (6.4.18) Theideaistotestthehypothesis H 0 : β m = β f. (6.4.19) If we write β =(α m,β m,α f,β f ) T,thenH 0 can be written as H 0 : A T β = b, (6.4.20) where A =(010 1), b =0. (6.4.21) Assuming (for simplicity) that the first m individuals are male (i =1, 2,...,m) and the rest (i = m +1,m+2,...,m+ w = n) are female, the linear model for this problem can be written as Y = Xβ + ɛ, where Y =(R 1,R 2,...,R n ),and X = 1 m W m 0 m 0 m 0 f 0 f 1 f W f, (6.4.22) Chapter 6 205

16 where W m =(w 01,w 02,...,w 0m ) T and W f =(w 0(m+1),w 0(m+2),...,w 0n ) T. Obviously, X T X = B m B f, (6.4.23) where B m = m m i=1 w 0i m i=1 w 0i m i=1 w2 0i, (6.4.24) and B m = m n i=m+1 w 0i n i=m+1 w 0i n i=m+1 w2 0i. (6.4.25) where B 1 m = (X T X) 1 = 1 mss m (W ) B 1 m 0 2 2, (6.4.26) B 1 f m i=1 w2 0i m i=1 w 0i m i=1 w 0i m, (6.4.27) where SS m (W )= m i=1 (w 0i W m ) 2 with W m = m i=1 w 0i/m. Similarly, B 1 f = 1 mss f (W ) n i=m+1 w2 0i n i=m+1 w 0i n i=m+1 w 0i m, (6.4.28) where SS f (W )= n i=m+1 (w 0i W f ) 2 with W f = n i=m+1 w 0i/w. This leads Chapter 6 206

17 to the usual LS estimator ˆβ = R m ˆβ m Wm SP m /SS m (W ) R f ˆβ f Wf SP f /SS f (W ) (6.4.29) where, SP m = m i=1 (R i R m )(w 0i W m ), and similarly, SP f = n i=m+1 (R i R f )(w 0i W f ). R m ˆβ m Wm SP A T ˆβ = ( ) m /SS m (W ) R f ˆβ f Wf SP f /SS f (W ) = SP m /SS m (W ) SP f /SS f (W ). (6.4.30) B = A T (X T X) 1 A 0 = ( ) B 1 m B 1 f 0 1 = SS m(w )+SS f (W ) SS m (W ) SS f (W ). (6.4.31) Chapter 6 207

18 Hence, Q = (A T ˆβ b) T B 1 (A T ˆβ b) = SS m(w ) SS f (W ) SS m (W )+SS f (W ) (SP m/ss m (W ) SP f /SS f (W )) 2 (6.4.32) The residual sum of squares for this model is RSS = m (R i ˆα m β ˆ m w 0i ) 2 + i=1 n i=m+1 (R i ˆα f ˆβ f w 0i ) 2 = SS m (R) ˆβ 2 mss m (W )+SS f (R) ˆβ 2 fss f (W ), (6.4.33) where SS f (R) = n i=m+1 (R i R f ) 2 with R f = n i=m+1 R i/w, and similarly for SS m (R). Corresponding F-statistic for testing H 0 : β m = β f is then given by F = Q RSS/(n 4). (6.4.34) Chapter 6 208

19 6.4.1 Alternative motivation and derivation of F-test Suppose we want to test the testable hypothesis A T β = b in the linear model Y = Xβ + ɛ (6.4.35) under assumption IV. Under the null hypothesis, the model (6.4.35) can be treated as a restricted model with the jointly estimable restrictions imposed by the null hypothesis. Now whether this hypothesis is true can be tested by the extra error sum of squares due to the restriction (null hypothesis). By the theory developed in Chapter 4, RSS H = RSS +(A T ˆβ b) T B 1 (A T ˆβ b) =RSS + Q (6.4.36) leading to RSS H RSS = Q. If the null hypothesis is false, one would expect this difference to be large. But how large? We compare this extra error SS to the RSS from the original model. In other words, under the alternative, we expect Q/RSS to be large, or equivalently, F =(n r)q/(q RSS) tobe large. The significance of this test can be calculated by noting the fact that under H 0, F follows a F -distribution with q and n r d.f. Chapter 6 209

20 Example Example continued. Under the null hypothesis, the model can be written as α m + βw 0i + ɛ i, if the ith individual is a male R i = α w + βw 0i + ɛ i, if the ith individual is a female (6.4.37) Notice the same slope for both subgroup. The parameter estimates for this problem are: ˆα m ˆα wˆβ = R m ˆβ W m R f ˆβ W f ˆβ ˆβ SP m +SP f SS m (W )+SS f (W ) = m SS m (W )+ f SS f (W ) SS m (W )+SS f (W ). (6.4.38) It can be shown that the residual sum of squares for this reduced model can be written as RSS H = SS m (R)+(ˆβ 2 2ˆβ ˆβ m )SS m (W )+SS f (R) ( ˆβ 2 2ˆβ ˆβ f )SS f (W ), leading to (6.4.39) RSS H RSS =(ˆβ m ˆβ) 2 SS m (W )+(ˆβ f ˆβ) 2 SS f (W ). (6.4.40) The test statistic F is then given by, { (n 4) ( ˆβ m ˆβ) 2 SS m (W )+(ˆβ f ˆβ) } 2 SS f (W ) F = SS m (R) ˆβ m 2 SS m(w )+SS f (R) ˆβ f 2SS f(w ) (6.4.41) Chapter 6 210

21 6.5 Non-testable Hypothesis In the previous section, we only considered hypothesis which are testable. What happens if you start with a non-testable hypothesis? A non-testable hypothesis H 0 : A T β = b is one for which every row of the linear function A T β are non-estimable and no linear combinations of them are estimable. From our discussion in Chapter 4, we know that when we impose non-estimable restrictions on the linear model, the restricted solution becomes yet another solution to the linear model and hence the residual sum of squares does not change. Thus, in this case, extra error sum of squares due to the non-testable hypothesis is identically equal to zero. However, one may still calculate a F - statistic using the formula (6.4.13) F = (AT ˆβ b) T B 1 (A T ˆβ b) qˆσ LS 2, (6.5.1) as long as B is invertible. What would this statistic test if H 0 : A T β = b is non-testable? In fact, when A T β is not estimable, the test statistic F would actually test the hypothesis A T (X T X) g X T Xβ = b. [See Problem 6 in this section] Chapter 6 211

22 Partially Testable Hypothesis Suppose you want to test the hypothesis A T β = b, A =(A p q 1 1 A p q 2 2 ); q 1 + q 2 = q, and b =(b T 1 bt 2 )T (6.5.2) such that A has rank q and A T 1 β is estimable while AT 2 β is not. From our discussion on the testable and non-testable hypotheses, we can easily recognize that if we use the F-statistic based on RSS H RSS, we will end up with a test statistic which will test the hypothesis H 0 : A T 1 β = b 1. However, if we use the test statistic (6.4.13), we will be testing the hypothesis A T β = b, (6.5.3) where A =(A 1 H T A 2 ) with H =(X T X) g X T X. Chapter 6 212

23 6.6 Problems 1. Consider the simple regression model: Y i = μ + βx i + ɛ i,i=1, 2,...,n, (6.6.1) where ɛ i,i=1, 2,...,n,are iid normal random variables with mean zero and variance σ 2. (a) For a given value x 0 of x, obtain the least square estimator of θ(x 0 )= μ + βx 0. (b) Derive a test statistic for testing the hypothesis H 0 : θ(x 0 )=θ 0, where θ 0 is a known constant. 2. Suppose θ = a T β is estimable in the model Y = Xβ + ɛ, wherea and β are p 1 vectors, Y is an n 1 random vector, X is an n p matrix with rank(x) =r p, andɛ N(0,σ 2 I n ). Assume ˆβ and S 2 to be the least squares estimators of β and σ 2 respectively. (a) What is the least squares estimator for a T β?callitˆθ. (b) What is the distribution of ˆθ? (c) What is the distribution of (n r)s 2 /σ 2? (d) Show that ˆθ and (n r)s 2 are independently distributed. (e) Show how one can construct a 95% confidence interval for θ. Chapter 6 213

24 (While answering parts (b) and (c), do not forget to mention the specific parameters of the corresponding distributions.) 3. Suppose Y 11,Y 12,...,Y 1ni be n 1 independent observations from a N(μ + α 1,σ 2 ) distribution and Y 21,Y 22,...,Y 2n2 be n 2 independent observations from a N(μ α 2,σ 2 ) distribution. Notice that the two populations have different means but the same standard deviation. Assume that Y 1j and Y 2j are independent for all j. Define n 1 + n 2 = n, andy = (Y 11,Y 12,...,Y 1n1,Y 21,Y 22,...,Y 2n2 ) T as the n 1 vector consisting of all n observations. We write Y as Y = Xβ + ɛ, (6.6.2) where β =(μ, α 1,α 2 ) T, X = 1 n 1 1 n1 0 1 n2 0 1 n2 with 1 ni representing a n i 1 vector of 1 s, and ɛ is an n 1 vector distributed as N(0,σ 2 I n ). (a) A particular solution to the normal equations for estimating β is given by ˆβ = GX T Y,whereG = diag(0, 1/n 1, 1/n 2 ). Show that the error sums of squares for this model is given by SSE =(Y X ˆβ) T (Y X ˆβ) = 2 ni ( i=1 j=1 Yij Ȳi.) 2. (b) Suppose one wants to test the hypothesis H 0 : α 1 = α 2. The restricted model under the null hypothesis is given by Y = X H β H + ɛ, (6.6.3) Chapter 6 214

25 where X H = 1 n 1 1 n1 1 n2 1 n2, and β H =(μ, α 1 ) T. The least squares estimator of β H is given by ˆβ H = 1 Ȳ1. + Ȳ2. 2. Find the error Ȳ 1. Ȳ2. sum of squares for the restricted model. Call it SSE H. Compare SSE and SSE H. Are they equal? Why or why not? (c) Is the hypothesis H 1 : α 1 + α 2 = 0 testable? If so, give a F -statistic to test this hypothesis. If not, state the reason why it is not testable. 4. Suppose two objects, say A 1 and A 2 with unknown weights β 1 and β 2 respectively are measured on a balance using the following scheme, all of these actions being repeated twice: both objects on the balance, resulting in weights Y 11 and Y 12, Only A 1 on the balance, resulting in weights Y 21 and Y 22,and Only A 2 on the balance, resulting in weights Y 31 and Y 32. Assume Y ij s are independent, normally distributed variables with common variance σ 2. Also assume that the balance has an unknown systematic (fixed) error β 0. The model can then be written as: Y 11 = Y 12 = β 0 + β 1 + β 2 + ɛ 1 Y 21 = Y 22 = β 0 + β 1 + ɛ 2 Y 31 = Y 32 = β 0 + β 2 + ɛ 3, Chapter 6 215

26 where ɛ i s are i.i.d. random variables. Let Ȳi. =(Y i1 + Y i2 )/2. (a) Express the least square estimates ˆβ of β =(β 0,β 1,β 2 ) T in terms of Ȳ i.,i=1, 2, 3. (b) Find Var(ˆβ). Argue that ˆβ has a normal distribution. (c) Suppose ˆσ 2 = 3 2 i=1 j=1 (Y ij Ȳi.) 2 /3. What is the distribution of ˆσ 2?. (d) Write the hypothesis that both objects have the same weight in the form H 0 : A T β = b. Identify C and d. (e) Propose an F-statistic for testing the above hypothesis. 5. A clinical study recruited 150 untreated hepatitis C patients along with 75 hepatitis C patients who have been treated previously with Interferonα monotherapy. There are two new drugs under study, pegintron and peginterferon. Investigators randomly allocated the two treatments to all 225 patients. The goal is to compare the effect of Pegintron and Peginterferon in reducing the hepatitis C. The response variable Y is the difference in viral levels between baseline and 24 weeks post treatment. Denote by α P and α PI the effect of pegintron and peginterferon respectively. (a) Write this problem as a general linear model assuming that the new Chapter 6 216

27 treatments may interact with whether the patient was previously treated or not. Use your own notation for effects, subscripts and error terms not defined here. (b) What is the design matrix for the model you specified in part (a)? (c) Is α PI α P estimable in the linear model you have proposed in part (a)? i. If yes, give its BLUE and a 100(1 α)% confidence interval for the difference between α PI and α P. ii. If not, give an estimable function that would represent the difference between the two new treatments and give a 100(1 α)% confidence interval for this estimable function. 6. Consider the linear model Y = Xβ + ɛ, where ɛn n (0,σ 2 I n ), X is a n p matrix of rank r(< p)andβ is of order p 1. (a) Let H =(X T X) g X T X. Show that A T Hβ is estimable for any p q matrix A. (b) For a matrix A of order p q of rank q, thetestable hypothesis H 0 : A T β = b can be tested using the test statistic F = (n r) q Q SSE, Chapter 6 217

28 where Q =(A T ˆβ b) T [A T (X T X) g A] 1 (A T ˆβ b) (6.6.4) which is identical to SSE H SSE. Here, ˆβ =(X T X) g X T y is a solution to the normal equations X T Xβ = X T Y.WhenA T β = b is non-testable, SSE H SSE is identically equal to zero. However, it is still possible to calculate Q using expression (6.6.4) if A T (X T X) g A is invertible. Now suppose that A T β = b is non-testable but A T (X T X) g A is invertible. Derive Q for testing the hypothesis H 0 : A T Hβ = b and show that it is algebraically identical to the test statistic Q that would be calculated for testing the non-testable hypothesis H 0 : A T β = b using (6.6.4). 7. Consider the one-way-anova model Y ij = μ + α i + ɛ ij ; i =1, 2, 3; j =1, 2,...n i. (a) Develop a test procedure for testing the hypotheseis H 0 : A T β =0 where A = Chapter 6 218

29 (b) Develop a test procedure for testing the hypothesis H 0 : A T β =0 where A = (c) Compare the test statistics from part (a) and (b). Are they same? If so, why? (d) If possible, derive a test procedure for testing the hypothesis H 0 : α 1 = α 2 =5. 8. For testing the hypothesis H 0 : A T β = b, the test statistic derived in class was F = Q/q SSE/(n r) = n r Q q SSE. (a) If U χ 2 n (λ) distribution, what is E(U)? (b) If V χ 2 n distribution, what is E(1/V )? (c) Use the independence of Q and SSE, and your results from parts (a)and(b)tofinde(f). (d) Show that E(F ) 1. Chapter 6 219

30 9. Consider the multiple regression model Y i = β 1 X 1i + β 2 X 2i β p X pi + ɛ i,i=1, 2,...,n. with E(ɛ i )=0andvar(ɛ i )=σ 2. Assume ɛ i s are independent. (a) Derive a test statistic for testing H 0 : β (1) =0,whereβ (1) includes only the first q(< p)componentsofβ. (b) How would you test the hypothesis β j = β j0? 10. Suppose (Y i,x i ),i=1, 2,...,n be a pair of n independent observations such that the conditional distribution of Y i X i is normal with mean μ 1 + ρσ 1 σ 2 (1 ρ) (X i μ 2 ) and variance σ 2 1 (σ2 1,σ2 2 > 0, ρ < 1). (a) Can you write the problem as a linear model problem? [Hint: Use transformed parameters, for example, define β 0 = μ 1 ρσ 1μ 2 σ 2 (1 ρ) ]. (b) Is the hypothesis μ 1 = 0 testable? (c) Is the hypothesis ρ = 0 linear in terms of your transformed parameters? If so, is it testable? (d) For part (b) and (c), if possible, provide the F -statistic for testing the above hypotheses. Chapter 6 220

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

BIOS 2083: Linear Models

BIOS 2083: Linear Models BIOS 2083: Linear Models Abdus S Wahed September 2, 2009 Chapter 0 2 Chapter 1 Introduction to linear models 1.1 Linear Models: Definition and Examples Example 1.1.1. Estimating the mean of a N(μ, σ 2

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Mixed models Yan Lu March, 2018, week 8 1 / 32 Restricted Maximum Likelihood (REML) REML: uses a likelihood function calculated from the transformed set

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Lecture 34: Properties of the LSE

Lecture 34: Properties of the LSE Lecture 34: Properties of the LSE The following results explain why the LSE is popular. Gauss-Markov Theorem Assume a general linear model previously described: Y = Xβ + E with assumption A2, i.e., Var(E

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1) Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Lecture 6: Linear models and Gauss-Markov theorem

Lecture 6: Linear models and Gauss-Markov theorem Lecture 6: Linear models and Gauss-Markov theorem Linear model setting Results in simple linear regression can be extended to the following general linear model with independently observed response variables

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

STAT 705 Chapter 16: One-way ANOVA

STAT 705 Chapter 16: One-way ANOVA STAT 705 Chapter 16: One-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 21 What is ANOVA? Analysis of variance (ANOVA) models are regression

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

2. Regression Review

2. Regression Review 2. Regression Review 2.1 The Regression Model The general form of the regression model y t = f(x t, β) + ε t where x t = (x t1,, x tp ), β = (β 1,..., β m ). ε t is a random variable, Eε t = 0, Var(ε t

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij = 20. ONE-WAY ANALYSIS OF VARIANCE 1 20.1. Balanced One-Way Classification Cell means parametrization: Y ij = µ i + ε ij, i = 1,..., I; j = 1,..., J, ε ij N(0, σ 2 ), In matrix form, Y = Xβ + ε, or 1 Y J

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4. Prof. Mei-Yuan Chen Spring 2008

FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4. Prof. Mei-Yuan Chen Spring 2008 FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4 Prof. Mei-Yuan Chen Spring 008. Partition and rearrange the matrix X as [x i X i ]. That is, X i is the matrix X excluding the column x i. Let u i denote

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance: 8. PROPERTIES OF LEAST SQUARES ESTIMATES 1 Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = 0. 2. The errors are uncorrelated with common variance: These assumptions

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

3. Linear Regression With a Single Regressor

3. Linear Regression With a Single Regressor 3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56 Econometrics: (II)

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58 ML and REML Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 58 Suppose y = Xβ + ε, where ε N(0, Σ) for some positive definite, symmetric matrix Σ.

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Masters Comprehensive Examination Department of Statistics, University of Florida

Masters Comprehensive Examination Department of Statistics, University of Florida Masters Comprehensive Examination Department of Statistics, University of Florida May 6, 003, 8:00 am - :00 noon Instructions: You have four hours to answer questions in this examination You must show

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic. Serik Sagitov, Chalmers and GU, February, 08 Solutions chapter Matlab commands: x = data matrix boxplot(x) anova(x) anova(x) Problem.3 Consider one-way ANOVA test statistic For I = and = n, put F = MS

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Appendix A: Review of the General Linear Model

Appendix A: Review of the General Linear Model Appendix A: Review of the General Linear Model The generallinear modelis an important toolin many fmri data analyses. As the name general suggests, this model can be used for many different types of analyses,

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 4. Linear Regression 4.1 Introduction So far our data have consisted of observations on a single variable of interest.

More information

LECTURE 5 HYPOTHESIS TESTING

LECTURE 5 HYPOTHESIS TESTING October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

BIOS 6649: Handout Exercise Solution

BIOS 6649: Handout Exercise Solution BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Multivariate Linear Regression Models

Multivariate Linear Regression Models Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between

More information