Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Similar documents
Lecture 11. Multivariate Normal theory

Lecture 15. Hypothesis testing in the linear model

Mathematical statistics

Central Limit Theorem ( 5.3)

Chapter 8.8.1: A factorization theorem

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

MIT Spring 2015

Summary of Chapters 7-9

Random Vectors and Multivariate Normal Distributions

MLES & Multivariate Normal Theory

STAT 730 Chapter 4: Estimation

STA 2101/442 Assignment 3 1

Introduction to Estimation Methods for Time Series models. Lecture 1

Advanced Econometrics I

MAS223 Statistical Inference and Modelling Exercises

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Lecture 1: August 28

General Linear Model: Statistical Inference

STAT5044: Regression and Anova. Inyoung Kim

[y i α βx i ] 2 (2) Q = i=1

Part IB. Statistics. Year

Lecture 11: Regression Methods I (Linear Regression)

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Lecture 11: Regression Methods I (Linear Regression)

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Multivariate Analysis and Likelihood Inference

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

The Statistical Property of Ordinary Least Squares

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Maximum Likelihood Estimation

Statistics & Data Sciences: First Year Prelim Exam May 2018

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

Master s Written Examination - Solution

Asymptotic Statistics-VI. Changliang Zou

BIOS 2083 Linear Models c Abdus S. Wahed

Mathematical statistics

3 Multiple Linear Regression

Regression and Statistical Inference

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

TMA4267 Linear Statistical Models V2017 (L10)

Bias Variance Trade-off

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

A Very Brief Summary of Statistical Inference, and Examples

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Notes on Random Vectors and Multivariate Normal

Sampling Distributions

STAT 540: Data Analysis and Regression

STAT 100C: Linear models

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

STA442/2101: Assignment 5

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Statistics. Statistics

Review. December 4 th, Review

ECE 275B Homework # 1 Solutions Version Winter 2015

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Bayesian Inference. Chapter 9. Linear models and regression

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

F & B Approaches to a simple model

PhD Qualifying Examination Department of Statistics, University of Florida

HT Introduction. P(X i = x i ) = e λ λ x i

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Foundations of Statistical Inference

Testing Statistical Hypotheses

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Master s Written Examination

Distributions of Quadratic Forms. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 31

ECE 275B Homework # 1 Solutions Winter 2018

STAT 512 sp 2018 Summary Sheet

Multiple Linear Regression

Mathematical statistics

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Stat 5102 Final Exam May 14, 2015

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

simple if it completely specifies the density of x

Master s Written Examination

3. Probability and Statistics

This paper is not to be removed from the Examination Halls

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Problem Selected Scores

Multivariate Distributions

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

1 Appendix A: Matrix Algebra

Large Sample Properties of Estimators in the Classical Linear Regression Model

BTRY 4090: Spring 2009 Theory of Statistics

Problem 1 (20) Log-normal. f(x) Cauchy

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Math 423/533: The Main Theoretical Topics

HANDBOOK OF APPLICABLE MATHEMATICS

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Stat 5101 Lecture Notes

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

Transcription:

Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures. They are nowhere near accurate representations of what was actually lectured, and in particular, all errors are almost surely mine. Estimation Review of distribution and density functions, parametric families. Examples: binomial, Poisson, gamma. Sufficiency, minimal sufficiency, the Rao-Blackwell theorem. Maximum likelihood estimation. Confidence intervals. Use of prior distributions and Bayesian inference. [5] Hypothesis testing Simple examples of hypothesis testing, null and alternative hypothesis, critical region, size, power, type I and type II errors, Neyman-Pearson lemma. Significance level of outcome. Uniformly most powerful tests. Likelihood ratio, and use of generalised likelihood ratio to construct test statistics for composite hypotheses. Examples, including t-tests and F -tests. Relationship with confidence intervals. Goodness-of-fit tests and contingency tables. [4] Linear models Derivation and joint distribution of maximum likelihood estimators, least squares, Gauss-Markov theorem. Testing hypotheses, geometric interpretation. Examples, including simple linear regression and one-way analysis of variance. Use of software. [7] 1

Contents IB Statistics (Theorems with proof) Contents 0 Introduction 3 1 Estimation 4 1.1 Estimators............................... 4 1.2 Mean squared error.......................... 4 1.3 Sufficiency............................... 4 1.4 Likelihood............................... 5 1.5 Confidence intervals......................... 5 1.6 Bayesian estimation......................... 5 2 Hypothesis testing 6 2.1 Simple hypotheses.......................... 6 2.2 Composite hypotheses........................ 7 2.3 Tests of goodness-of-fit and independence............. 7 2.3.1 Goodness-of-fit of a fully-specified null distribution.... 7 2.3.2 Pearson s Chi-squared test.................. 7 2.3.3 Testing independence in contingency tables........ 7 2.4 Tests of homogeneity, and connections to confidence intervals.. 7 2.4.1 Tests of homogeneity..................... 7 2.4.2 Confidence intervals and hypothesis tests......... 7 2.5 Multivariate normal theory..................... 8 2.5.1 Multivariate normal distribution.............. 8 2.5.2 Normal random samples................... 9 2.6 Student s t-distribution....................... 10 3 Linear models 11 3.1 Linear models............................. 11 3.2 Simple linear regression....................... 11 3.3 Linear models with normal assumptions.............. 12 3.4 The F distribution.......................... 14 3.5 Inference for β............................ 14 3.6 Simple linear regression....................... 14 3.7 Expected response at x....................... 14 3.8 Hypothesis testing.......................... 14 3.8.1 Hypothesis testing...................... 14 3.8.2 Simple linear regression................... 14 3.8.3 One way analysis of variance with equal numbers in each group............................. 14 2

0 Introduction IB Statistics (Theorems with proof) 0 Introduction 3

1 Estimation IB Statistics (Theorems with proof) 1 Estimation 1.1 Estimators 1.2 Mean squared error 1.3 Sufficiency Theorem (The factorization criterion). T is sufficient for θ if and only if for some functions g and h. f X (x θ) = g(t (x), θ)h(x) Proof. We first prove the discrete case. Suppose f X (x θ) = g(t (x), θ)h(x). If T (x) = t, then f X T =t (x) = P θ(x = x, T (X) = t) P θ (T = t) = g(t (x), θ)h(x) g(t (y), θ)h(y) {y:t (y)=t} g(t, θ)h(x) = g(t, θ) h(y) = h(x) h(y) which doesn t depend on θ. So T is sufficient. The continuous case is similar. If f X (x θ) = g(t (x), θ)h(x), and T (x) = t, then f X T =t (x) = = y:t (y)=t g(t (x), θ)h(x) g(t (y), θ)h(y) dy g(t, θ)h(x) g(t, θ) h(y) dy = h(x), h(y) dy which does not depend on θ. Now suppose T is sufficient so that the conditional distribution of X T = t does not depend on θ. Then P θ (X = x) = P θ (X = x, T = T (x)) = P θ (X = x T = T (x))p θ (T = T (x)). The first factor does not depend on θ by assumption; call it h(x). Let the second factor be g(t, θ), and so we have the required factorisation. Theorem. Suppose T = T (X) is a statistic that satisfies f X (x; θ) f X (y; θ) Then T is minimal sufficient for θ. does not depend on θ if and only if T (x) = T (y). 4

1 Estimation IB Statistics (Theorems with proof) Proof. First we have to show sufficiency. We will use the factorization criterion to do so. Firstly, for each possible t, pick a favorite x t such that T (x t ) = t. Now let x X N and let T (x) = t. So T (x) = T (x t ). By the hypothesis, f X (x;θ) f X (x t:θ) does not depend on θ. Let this be h(x). Let g(t, θ) = f X(x t, θ). Then f X (x; θ) = f X (x t ; θ) f X(x; θ) = g(t, θ)h(x). f X (x t ; θ) So T is sufficient for θ. To show that this is minimal, suppose that S(X) is also sufficient. By the factorization criterion, there exist functions g S and h S such that Now suppose that S(x) = S(y). Then f X (x; θ) = g S (S(x), θ)h S (x). f X (x; θ) f X (y; θ) = g S(S(x), θ)h S (x) g S (S(y), θ)h S (y) = h S(x) h S (y). This means that the ratio f X(x;θ) f X (y;θ) does not depend on θ. By the hypothesis, this implies that T (x) = T (y). So we know that S(x) = S(y) implies T (x) = T (y). So T is a function of S. So T is minimal sufficient. Theorem (Rao-Blackwell Theorem). Let T be a sufficient statistic for θ and let θ be an estimator for θ with E( θ 2 ) < for all θ. Let ˆθ(x) = E[ θ(x) T (X) = T (x)]. Then for all θ, E[(ˆθ θ) 2 ] E[( θ θ) 2 ]. The inequality is strict unless θ is a function of T. Proof. By the conditional expectation formula, we have E(ˆθ) = E[E( θ T )] = E( θ). So they have the same bias. By the conditional variance formula, var( θ) = E[var( θ T )] + var[e( θ T )] = E[var( θ T )] + var(ˆθ). Hence var( θ) var(ˆθ). So mse( θ) mse(ˆθ), with equality only if var( θ T ) = 0. 1.4 Likelihood 1.5 Confidence intervals 1.6 Bayesian estimation 5

2 Hypothesis testing IB Statistics (Theorems with proof) 2 Hypothesis testing 2.1 Simple hypotheses Lemma (Neyman-Pearson lemma). Suppose H 0 : f = f 0, H 1 : f = f 1, where f 0 and f 1 are continuous densities that are nonzero on the same regions. Then among all tests of size less than or equal to α, the test with the largest power is the likelihood ratio test of size α. Proof. Under the likelihood ratio test, our critical region is { C = x : f } 1(x) f 0 (x) > k, where k is chosen such that α = P(reject H 0 H 0 ) = P(X C H 0 ) = C f 0(x) dx. The probability of Type II error is given by β = P(X C f 1 ) = f 1 (x) dx. Let C be the critical region of any other test with size less than or equal to α. Let α = P(X C f 0 ) and β = P(X C f 1 ). We want to show β β. We know α α, ie f 0 (x) dx f 0 (x) dx. C C Also, on C, we have f 1 (x) > kf 0 (x), while on C we have f 1 (x) kf 0 (x). So f 1 (x) dx k f 0 (x) dx C C C C f 1 (x) dx k f 0 (x) dx. C C C C Hence β β = f 1 (x) dx f 1 (x) dx C C = f 1 (x) dx + f 1 (x) dx C C C C f 1 (x) dx f 1 (x) dx C C C C = f 1 (x) dx f 1 (x) dx C C C C k f 0 (x) dx k f 0 (x) dx C C C C { } = k f 0 (x) dx + f 0 (x) dx C C C C { } k f 0 (x) dx + f 0 (x) dx C C C C = k(α α) 0. C 6

2 Hypothesis testing IB Statistics (Theorems with proof) C C C C C (f 1 kf 0) β /H 1 C C C (f 1 kf 0) C C α /H 0 β/h 1 α/h 0 2.2 Composite hypotheses Theorem (Generalized likelihood ratio theorem). Suppose Θ 0 Θ 1 and Θ 1 Θ 0 = p. Let X = (X 1,, X n ) with all X i iid. Then if H 0 is true, as n, 2 log Λ X (H 0 : H 1 ) χ 2 p. If H 0 is not true, then 2 log Λ tends to be larger. We reject H 0 if 2 log Λ > c, where c = χ 2 p(α) for a test of approximately size α. 2.3 Tests of goodness-of-fit and independence 2.3.1 Goodness-of-fit of a fully-specified null distribution 2.3.2 Pearson s Chi-squared test 2.3.3 Testing independence in contingency tables 2.4 Tests of homogeneity, and connections to confidence intervals 2.4.1 Tests of homogeneity 2.4.2 Confidence intervals and hypothesis tests Theorem. (i) Suppose that for every θ 0 Θ there is a size α test of H 0 : θ = θ 0. Denote the acceptance region by A(θ 0 ). Then the set I(X) = {θ : X A(θ)} is a 100(1 α)% confidence set for θ. (ii) Suppose I(X) is a 100(1 α)% confidence set for θ. Then A(θ 0 ) = {X : θ 0 I(X)} is an acceptance region for a size α test of H 0 : θ = θ 0. Proof. First note that θ 0 I(X) iff X A(θ 0 ). For (i), since the test is size α, we have P(accept H 0 H 0 is true) = P(X A ( θ 0 ) θ = θ 0 ) = 1 α. 7

2 Hypothesis testing IB Statistics (Theorems with proof) And so P(θ 0 I(X) θ = θ 0 ) = P(X A(θ 0 ) θ = θ 0 ) = 1 α. For (ii), since I(X) is a 100(1 α)% confidence set, we have P (θ 0 I(X) θ = θ 0 ) = 1 α. So P(X A(θ 0 ) θ = θ 0 ) = P(θ I(X) θ = θ 0 ) = 1 α. 2.5 Multivariate normal theory 2.5.1 Multivariate normal distribution Proposition. (i) If X N n (µ, Σ), and A is an m n matrix, then AX N m (Aµ, AΣA T ). (ii) If X N n (0, σ 2 I), then Proof. X 2 σ 2 = XT X σ 2 = X 2 i σ 2 χ2 n. Instead of writing X 2 /σ 2 χ 2 n, we often just say X 2 σ 2 χ 2 n. (i) See example sheet 3. (ii) Immediate from definition of χ 2 n. Proposition. Let X N n (µ, Σ). We split X up into two parts: X = where X i is a n i 1 column vector and n 1 + n 2 = n. Similarly write ( ) ( ) µ1 Σ11 Σ µ =, Σ = 12, µ 2 Σ 21 Σ 22 where Σ ij is an n i n j matrix. Then (i) X i N ni (µ i, Σ ii ) (ii) X 1 and X 2 are independent iff Σ 12 = 0. Proof. (i) See example sheet 3. (ii) Note that by symmetry of Σ, Σ 12 = 0 if and only if Σ 21 = 0. From ( ), M X (t) = exp(t T µ+ 1 2 tt Σt) for each t R n. We write t = ( X1 X 2 ), Then the mgf is equal to M X (t) = exp (t T1 µ 1 + t T2 Σ 11 t 1 + 12 tt2 Σ 22 t 2 + 12 tt1 Σ 12 t 2 + 12 ) tt2 Σ 21 t 1. From (i), we know that M Xi (t i ) = exp(t T i µ i + 1 2 tt i Σ iit i ). So M X (t) = M X1 (t 1 )M X2 (t 2 ) for all t if and only if Σ 12 = 0. ( t1 t 2 ). 8

2 Hypothesis testing IB Statistics (Theorems with proof) Proposition. When Σ is a positive definite, then X has pdf ( ) n 1 exp 2π f X (x; µ, Σ) = 1 Σ 2 2.5.2 Normal random samples [ 1 2 (x µ)t Σ 1 (x µ) Theorem (Joint distribution of X and SXX ). Suppose X 1,, X n are iid N(µ, σ 2 ) and X = 1 n Xi, and S XX = (X i X) 2. Then (i) X N(µ, σ 2 /n) (ii) S XX /σ 2 χ 2 n 1. (iii) X and SXX are independent. Proof. We can write the joint density as X N n (µ, σ 2 I), where µ = (µ, µ,, µ). Let A be an n n orthogonal matrix with the first row all 1/ n (the other rows are not important). One possible such matrix is A = 1 n 1 n 1 n 1 1 n n 1 2 1 1 1 3 2 1 3 2. 2 1 0 0 0 2 3 2 0 0....... 1 n(n 1) n(n 1) 1 1 1 n(n 1) n(n 1) Now define Y = AX. Then We have Y N n (Aµ, Aσ 2 IA T ) = N n (Aµ, σ 2 I). Aµ = ( nµ, 0,, 0) T. ]. (n 1) n(n 1) So Y 1 N( nµ, σ 2 ) and Y i N(0, σ 2 ) for i = 2,, n. Also, Y 1,, Y n are independent, since the covariance matrix is every non-diagonal term 0. But from the definition of A, we have Y 1 = 1 n n i=1 X i = n X. So n X N( nµ, σ 2 ), or X N(µ, σ 2 /n). Also Y2 2 + + Yn 2 = Y T Y Y1 2 = X T A T AX Y1 2 = X T X n X 2 n = Xi 2 n X 2 = i=1 n (X i X) 2 i=1 = S XX. So S XX = Y 2 2 + + Y 2 n σ 2 χ 2 n 1. Finally, since Y 1 and Y 2,, Y n are independent, so are X and S XX. 9

2 Hypothesis testing IB Statistics (Theorems with proof) 2.6 Student s t-distribution Proposition. If k > 1, then E k (T ) = 0. If k > 2, then var k (T ) = k k 2. If k = 2, then var k (T ) =. In all other cases, the values are undefined. In particular, the k = 1 case, this is known as the Cauchy distribution, and has undefined mean and variance. 10

3 Linear models IB Statistics (Theorems with proof) 3 Linear models 3.1 Linear models Proposition. The least squares estimator satisfies 3.2 Simple linear regression X T X ˆβ = X T Y. (3) Theorem (Gauss Markov theorem). In a full rank linear model, let ˆβ be the least squares estimator of β and let β be any other unbiased estimator for β which is linear in the Y i s. Then var(t T ˆβ) var(t T β ). for all t R p. We say that ˆβ is the best linear unbiased estimator of β (BLUE). Proof. Since β is linear in the Y i s, β = AY for some p n matrix A. Since β is an unbiased estimator, we must have E[β ] = β. However, since β = AY, E[β ] = AE[Y] = AXβ. So we must have β = AXβ. Since this holds for any β, we must have AX = I p. Now cov(β ) = E[(β β)(β β) T ] Since AXβ = β, this is equal to = E[(AY β)(ay β) T ] = E[(AXβ + Aε β)(axβ + Aε β) T ] = E[Aε(Aε) T ] = A(σ 2 I)A T = σ 2 AA T. Now let β ˆβ = (A (X T X) 1 X T )Y = BY, for some B. Then BX = AX (X T X 1 )X T X = I p I p = 0. By definition, we have AY = BY + (X T X) 1 X T Y, and this is true for all Y. So A = B + (X T X) 1 X T. Hence cov(β ) = σ 2 AA T = σ 2 (B + (X T X) 1 X T )(B + (X T X) 1 X T ) T = σ 2 (BB T + (X T X) 1 ) = σ 2 BB T + cov(ˆβ). Note that in the second line, the cross-terms disappear since BX = 0. So for any t R p, we have var(t T β ) = t T cov(β )t = t T cov(ˆβ)t + t T BB T tσ 2 = var(t T ˆβ) + σ 2 B T t 2 var(t T ˆβ). 11

3 Linear models IB Statistics (Theorems with proof) Taking t = (0,, 1, 0,, 0) T with a 1 in the ith position, we have var( ˆβ i ) var(β i ). 3.3 Linear models with normal assumptions Proposition. Under normal assumptions the maximum likelihood estimator for a linear model is ˆβ = (X T X) 1 X T Y, which is the same as the least squares estimator. Lemma. (i) If Z N n (0, σ 2 I) and A is n n, symmetric, idempotent with rank r, then Z T AZ σ 2 χ 2 r. (ii) For a symmetric idempotent matrix A, rank(a) = tr(a). Proof. (i) Since A is idempotent, A 2 = A by definition. So eigenvalues of A are either 0 or 1 (since λx = Ax = A 2 x = λ 2 x). (ii) Since A is also symmetric, it is diagonalizable. So there exists an orthogonal Q such that Λ = Q T AQ = diag(λ 1,, λ n ) = diag(1,, 1, 0,, 0) with r copies of 1 and n r copies of 0. Let W = Q T Z. So Z = QW. Then W N n (0, σ 2 I), since cov(w) = Q T σ 2 IQ = σ 2 I. Then Z T AZ = W T Q T AQW = W T ΛW = rank(a) = rank(λ) = tr(λ) = tr(q T AQ) = tr(aq T Q) = tr A Theorem. For the normal linear model Y N n (Xβ, σ 2 I), (i) ˆβ N p (β, σ 2 (X T X) 1 ) (ii) RSS σ 2 χ 2 n p, and so ˆσ 2 σ2 n χ2 n p. (iii) ˆβ and ˆσ 2 are independent. Proof. r wi 2 χ 2 r. i=1 12

3 Linear models IB Statistics (Theorems with proof) We have ˆβ = (X T X) 1 X T Y. Call this CY for later use. Then ˆβ has a normal distribution with mean and covariance So (X T X) 1 X T (Xβ) = β (X T X) 1 X T (σ 2 I)[(X T X) 1 X T ] T = σ 2 (X T X) 1. ˆβ N p (β, σ 2 (X T X) 1 ). Our previous lemma says that Z T AZ σ 2 χ 2 r. So we pick our Z and A so that Z T AZ = RSS, and r, the degrees of freedom of A, is n p. Let Z = Y Xβ and A = (I n P ), where P = X(X T X) 1 X T. We first check that the conditions of the lemma hold: Since Y N n (Xβ, σ 2 I), Z = Y Xβ N n (0, σ 2 I). Since P is idempotent, I n P also is (check!). We also have rank(i n P ) = tr(i n P ) = n p. Therefore the conditions of the lemma hold. To get the final useful result, we want to show that the RSS is indeed Z T AZ. We simplify the expressions of RSS and Z T AZ and show that they are equal: Z T AZ = (Y Xβ) T (I n P )(Y Xβ) = Y T (I n P )Y. Noting the fact that (I n P )X = 0. Writing R = Y Ŷ = (I n P )Y, we have RSS = R T R = Y T (I n P )Y, using the symmetry and idempotence of I n P. Hence RSS = Z T AZ σ 2 χ 2 n p. Then Let V = ˆσ 2 = RSS n ( ) ˆβ = DY, where D = R σ2 n χ2 n p. ( C I n P Since Y is multivariate, V is multivariate with ) is a (p + n) n matrix. cov(v ) = Dσ 2 ID T ( ) = σ 2 CC T C(I n P ) T (I n P )C T (I n P )(I n P ) T ( ) = σ 2 CC T C(I n P ) (I n P )C T (I n P ) ( ) = σ 2 CC T 0 0 I n P Using C(I n P ) = 0 (since (X T X) 1 X T (I n P ) = 0 since (I n P )X = 0 check!). Hence ˆβ and R are independent since the off-diagonal covariant terms are 0. So ˆβ and RSS = R T R are independent. So ˆβ and ˆσ 2 are independent. 13

3 Linear models IB Statistics (Theorems with proof) 3.4 The F distribution Proposition. If X F m,n, then 1/X F n,m. 3.5 Inference for β 3.6 Simple linear regression 3.7 Expected response at x 3.8 Hypothesis testing 3.8.1 Hypothesis testing Lemma. Suppose Z N n (0, σ 2 I n ), and A 1 and A 2 are symmetric, idempotent n n matrices with A 1 A 2 = 0 (i.e. they are orthogonal). Then Z T A 1 Z and Z T A 2 Z are independent. Proof. Let X i = A i Z, i = 1, 2 and ( ) W1 W = = W 2 Then W N 2n (( 0 0 ) ( A1 A 2 ) Z. ( )), σ 2 A1 0 0 A 2 since the off diagonal matrices are σ 2 A T 1 A 2 = A 1 A 2 = 0. So W 1 and W 2 are independent, which implies and are independent W T 1 W 1 = Z T A T 1 A 1 Z = Z T A 1 A 1 Z = Z T A 1 Z W T 2 W 2 = Z T A T 2 A 2 Z = Z T A 2 A 2 Z = Z T A 2 Z 3.8.2 Simple linear regression 3.8.3 One way analysis of variance with equal numbers in each group 14