Lecture 15. Hypothesis testing in the linear model

Similar documents
Ch 2: Simple Linear Regression

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

11 Hypothesis Testing

Correlation and Regression

Probability and Statistics Notes

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Linear Models and Estimation by Least Squares

MS&E 226: Small Data

Inference for Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Institute of Actuaries of India

3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Maximum Likelihood Estimation

BIOS 2083 Linear Models c Abdus S. Wahed

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Section 4.6 Simple Linear Regression

Simple Linear Regression

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Lecture 6 Multiple Linear Regression, cont.

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Multiple Linear Regression

Central Limit Theorem ( 5.3)

Simple Linear Regression

Simple and Multiple Linear Regression

General Linear Model: Statistical Inference

Multivariate Regression

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Master s Written Examination - Solution

where x and ȳ are the sample means of x 1,, x n

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Linear models and their mathematical foundations: Simple linear regression

TMA4255 Applied Statistics V2016 (5)

Simple Linear Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Concordia University (5+5)Q 1.

STA 114: Statistics. Notes 21. Linear Regression

Economics 620, Lecture 5: exp

2.6.3 Generalized likelihood ratio tests

Homework 2: Simple Linear Regression

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Master s Written Examination

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples.

Correlation 1. December 4, HMS, 2017, v1.1

Applied Regression Analysis

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Regression and Statistical Inference

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Lecture 14 Simple Linear Regression

Math 3330: Solution to midterm Exam

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

Stat 5102 Final Exam May 14, 2015

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Lecture 5: Clustering, Linear Regression

3 Multiple Linear Regression

Introduction to Statistical Inference Lecture 8: Linear regression, Tests and confidence intervals

Ch 3: Multiple Linear Regression

13 Simple Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Lecture 5: Clustering, Linear Regression

Introduction to Normal Distribution

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Topic 19 Extensions on the Likelihood Ratio

This paper is not to be removed from the Examination Halls

Inference in Regression Analysis

Categorical Predictor Variables

Hypothesis Testing One Sample Tests

STA 2101/442 Assignment 3 1

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016

LECTURE 5 HYPOTHESIS TESTING

Data Mining Stat 588

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

MATH5745 Multivariate Methods Lecture 07

4 Multiple Linear Regression

Lecture 6: Linear Regression

TMA4267 Linear Statistical Models V2017 (L10)

First Year Examination Department of Statistics, University of Florida

Measuring the fit of the model - SSR

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

CAS MA575 Linear Models

Formal Statement of Simple Linear Regression Model

Confidence Intervals, Testing and ANOVA Summary

2017 Financial Mathematics Orientation - Statistics

Transcription:

14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1)

Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma 15.1 Suppose Z N n (0, σ 2 I n ) and A 1 and A 2 and symmetric, idempotent n n matrices with A 1 A 2 = 0. Then Z T A 1 Z and Z T A 2 Z are independent. Proof: Let W i = A i Z, i = 1, 2 and W 2n 1 = ( W1 By Proposition 11.1(i), W N 2n (( 0 0 ) ( ) A1 = AZ, where A =. W 2 2n n A 2 ) ( )), σ 2 A1 0 check. 0 A 2 So W 1 and W 2 are independent, which implies W 1 T W 1 = Z T A 1 Z and W 2 T W 2 = Z T A 2 Z are independent.. Lecture 15. Hypothesis testing in the linear model 2 (1 1)

Hypothesis testing 15. Hypothesis testing in the linear model 15.2. Hypothesis testing ( ) β0 Suppose X = ( X 0 X 1 ) and β =, where n p n p 0 n (p p 0) β 1 rank(x ) = p, rank(x 0 ) = p 0. We want to test H 0 : β 1 = 0 against H 1 : β 1 0. Under H 0, Y = X 0 β 0 + ε. Under H 0, MLEs of β 0 and σ 2 are ˆβ 0 = (X 0 T X 0 ) 1 X 0 T Y ˆσ 2 = RSS 0 n and these are independent, by Theorem 13.3. So fitted values under H 0 are where P 0 = X 0 (X 0 T X 0 ) 1 X 0 T. = 1 n (Y X 0ˆβ 0 ) T (Y X 0ˆβ 0 ) Ŷ = X 0 (X 0 T X 0 ) 1 X 0 T Y = P 0 Y, Lecture 15. Hypothesis testing in the linear model 3 (1 1)

Geometric interpretation 15. Hypothesis testing in the linear model 15.3. Geometric interpretation Lecture 15. Hypothesis testing in the linear model 4 (1 1)

15. Hypothesis testing in the linear model 15.4. Generalised likelihood ratio test Generalised likelihood ratio test The generalised likelihood ratio test of H 0 against H 1 is ( ) n ( 1 2πˆσ 2 exp 1 2ˆσ (Y X ˆβ) T (Y X ˆβ) ) Λ Y (H 0, H 1 ) = ( ) 2 n ( ) 1 exp 1 (Y X ˆβ0 ) T (Y X ˆβ0 ) 2ˆσ 2 = ( ˆσ 2 ˆσ 2 2πˆσ 2 ) n 2 = ( ) n RSS0 RSS 2 = (1 + RSS 0 RSS RSS ) n 2 We reject H 0 when 2 log Λ is large, equivalently when (RSS0 RSS) RSS Using the results in Lecture 8, under H 0 ( 2 log Λ = n log 1 + RSS ) 0 RSS RSS is large. is approximately a χ 2 p 1 p 0 rv. But we can get an exact null distribution. Lecture 15. Hypothesis testing in the linear model 5 (1 1)

15. Hypothesis testing in the linear model 15.5. Null distribution of test statistic Null distribution of test statistic We have RSS = Y T (I n P)Y (see proof of Theorem 13.3 (ii)), and so RSS 0 RSS = Y T (I n P 0 )Y Y T (I n P)Y = Y T (P P 0 )Y. Now I n P and P P 0 are symmetric and idempotent, and therefore rank(i n P) = n p, and rank(p P 0 ) = tr(p P 0 ) = tr(p) tr(p 0 ) = rank(p) rank(p 0 ) = p p 0. Also (I n P)(P P 0 ) = (I n P)P (I n P)P 0 = 0. Finally, Y T (I n P)Y = (Y X 0 β 0 ) T (I n P)(Y X 0 β 0 ) since (I n P)X 0 = 0, Y T (P P 0 )Y = (Y X 0 β 0 ) T (P P 0 )(Y X 0 β 0 ) since (P P 0 )X 0 = 0, Lecture 15. Hypothesis testing in the linear model 6 (1 1)

15. Hypothesis testing in the linear model 15.5. Null distribution of test statistic Applying Lemmas 13.2 (Z T A i Z σ 2 χ 2 r ) and 15.1 to Z = Y X 0 β 0, A 1 = I n P, A 2 = P P 0 to get that under H 0, RSS = Y T (I n P)Y χ 2 n p and these rvs are independent. So under H 0, RSS 0 RSS = Y T (P P 0 )Y χ 2 p p 0 F = YT (P P 0 )Y/(p p 0 ) Y T (I n P)Y/(n p) = (RSS 0 RSS)/(p p 0 ) F p p0,n p. RSS/(n p) Hence we reject H 0 if F > F p p0,n p(α). RSS 0 - RSS is the reduction in the sum of squares due to fitting β 1. Lecture 15. Hypothesis testing in the linear model 7 (1 1)

15. Hypothesis testing in the linear model 15.6. Arrangement as an analysis of variance table Arrangement as an analysis of variance table Source of degrees of sum of squares mean square F statistic variation freedom (df) Fitted model p p 0 RSS 0 - RSS (RSS 0 RSS) (p p 0) (RSS 0 RSS)/(p p 0) RSS/(n p) Residual n p RSS RSS (n p) n p 0 RSS 0 The ratio (RSS0 RSS) is sometimes known as the proportion of variance RSS 0 explained by β 1, and denoted R 2. Lecture 15. Hypothesis testing in the linear model 8 (1 1)

Simple linear regression We assume that 15. Hypothesis testing in the linear model 15.7. Simple linear regression Y i = a + b(x i x) + ε i, i = 1,..., n, where x = x i /n, and ε i, i = 1,..., n are iid N(0, σ 2 ). Suppose we want to test the hypothesis H 0 : b = 0, i.e. no linear relationship. From Lecture 14 we have seen how to construct a confidence interval, and so could simply see if it included 0. Alternatively, under H 0, the model is Y i N(a, σ 2 ), and so â = Y, and the fitted values are Ŷi = Y. The observed RSS 0 is therefore RSS 0 = i (y i y) 2 = S yy. The fitted sum of squares is therefore RSS 0 RSS = ( (y i y) 2 (y i y ˆb(x i x)) 2) = ˆb 2 (x i x) 2 = ˆb 2 S xx. i Lecture 15. Hypothesis testing in the linear model 9 (1 1)

15. Hypothesis testing in the linear model 15.7. Simple linear regression Source of d.f. sum of squares mean square F statistic variation Fitted model 1 RSS 0 RSS = ˆb 2 S xx ˆb 2 S xx F = ˆb 2 S xx / σ 2 Residual n 2 RSS = i (y i ŷ) 2 σ 2 n 1 RSS 0 = i (y i y) 2 Note that the proportion of variance explained is ˆb 2 S xx /S yy = where r is Pearson s Product Moment Correlation coefficient r = S xy / S xx S yy. From lecture 14, slide 5, we see that under H 0, s.e.(ˆb) = σ/ S xx. So ˆb = ˆb Sxx s.e.(ˆb) σ = t. S2 xy S xx S yy = r 2, ˆb s.e.(ˆb) t n 2, where Checking whether t > t n 2 ( α 2 ) is precisely the same as checking whether t 2 = F > F 1,n 2 (α), since a F 1,n 2 variable is tn 2 2. Hence the same conclusion is reached, whether based on a t-distribution or the F statistic derived from an analysis-of-variance table. Lecture 15. Hypothesis testing in the linear model 10 (1 1)

15. Hypothesis testing in the linear model 15.7. Simple linear regression Example 12.1 continued As R code > fit=lm(time~ oxy.s ) > summary.aov(fit) Df Sum Sq Mean Sq F value Pr(>F) oxy.s 1 129690 129690 41.98 1.62e-06 *** Residuals 22 67968 3089 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Note that the F statistic, 41.98, is 6.48 2, the square of the t statistic on Slide 5 in Lecture 14. Lecture 15. Hypothesis testing in the linear model 11 (1 1)

15. Hypothesis testing in the linear model 15.8. One way analysis of variance with equal numbers in each group One way analysis of variance with equal numbers in each group Assume J measurements taken in each of I groups, and that Y i,j = µ i + ε i,j, where ε i,j are independent N(0, σ 2 ) random variables, and the µ i s are unknown constants. Fitting this model gives RSS = I J i=1 j=1 (Y i,j ˆµ i ) 2 = I J i=1 j=1 (Y i,j Y i. ) 2 on n I degrees of freedom. Suppose we want to test the hypothesis H 0 : µ i = µ, i.e. no difference between groups. Under H 0, the model is Y i,j N(µ, σ 2 ), and so ˆµ = Y.., and the fitted values are Ŷi,j = Y... The observed RSS 0 is therefore RSS 0 = i (y i,j y.. ) 2. j Lecture 15. Hypothesis testing in the linear model 12 (1 1)

15. Hypothesis testing in the linear model 15.8. One way analysis of variance with equal numbers in each group The fitted sum of squares is therefore RSS 0 RSS = ( (yi,j y.. ) 2 (y i,j y i. ) 2) = J (y i. y.. ) 2. i j i Source of d.f. sum of squares mean square F statistic variation Fitted model I 1 J i (y i. y.. ) 2 J i (y i. y.. )2 (I 1) F = J i (y i. y.. )2 (I 1) σ 2 Residual n I i j (y i,j y i. ) 2 σ 2 n 1 i j (y i,j y.. ) 2 Lecture 15. Hypothesis testing in the linear model 13 (1 1)

15. Hypothesis testing in the linear model 15.8. One way analysis of variance with equal numbers in each group Example 13.1 As R code > summary.aov(fit) Df Sum Sq Mean Sq F value Pr(>F) x 4 507.9 127.0 1.17 0.354 Residuals 20 2170.1 108.5 The p-value is 0.35, and so there is no evidence for a difference between the instruments. Lecture 15. Hypothesis testing in the linear model 14 (1 1)