Inference in Regression Analysis

Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1

Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor variable in the i th trial ǫ i ~ iid N(0,σ 2 ) i = 1,,n Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 2

Inferences concerning β 1 Tests concerning β 1 (the slope) are often of interest, particularly H 0 : β 1 = 0 H a : β1 0 the null hypothesis model Y i = β 0 +(0)X i + ǫ i implies that there is no relationship between Y and X Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 3

Review : Hypothesis Testing Elements of a statistical test Null hypothesis, H 0 Alternative hypothesis, H a Test statistic Rejection region Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 4

Review : Hypothesis Testing - Errors Errors A type I error is made if H 0 is rejected when H 0 is true. The probability of a type I error is denoted by α. The value of α is called the level of the test. A type II error is made if H 0 is accepted when H a is true. The probability of a type II error is denoted by β. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 5

P-value The p-value, or attained significance level, is the smallest level of significance α for which the observed data indicate that the null hypothesis should be rejected. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 6

Null Hypothesis If β 1 = 0 then with 95% confidence the b 1 would fall in some range around zero 40 Guess, y = 0x + 21.2, mse: 37.1 True, y = 2x + 9, mse: 4.22 35 Response/Output 30 25 20 15 10 1 2 3 4 5 6 7 8 9 10 11 Predictor/Input Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 7

Alternative Hypothesis : Least Squares Fit 40 35 Estimate, y = 2.09x + 8.36, mse: 4.15 True, y = 2x + 9, mse: 4.22 Response/Output 30 25 20 b 1 rescaled is test statistic 15 10 1 2 3 4 5 6 7 8 9 10 11 Predictor/Input Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 8

Testing This Hypothesis Only have a finite sample Different finite set of samples (from the same population / source) will (almost always) produce different estimates of β 0 and β 1 (b 0, b 1 ) given the same estimation procedure b 0 and b 1 are random variables whose sampling distributions can be statistically characterized Hypothesis tests can be constructed using these distributions. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 9

Example : Sampling Dist. Of b 1 The point estimator for b 1 is b 1 = (Xi X)(Y i Ȳ) (Xi X) 2 The sampling distribution for b 1 is the distribution over b 1 that occurs when the predictor variables X i are held fixed and the observed outputs are repeatedly sampled Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 10

Sampling Dist. Of b 1 In Normal Regr. Model For a normal error regression model the sampling distribution of b 1 is normal, with mean and variance given by E(b 1 ) = β 1 σ 2 V(b 1 ) = (Xi X) 2 To show this we need to go through a number of algebraic steps. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 11

First step To show (Xi X)(Y i Ȳ)= (X i X)Y i we observe (Xi X)(Y i Ȳ) = (X i X)Y i (X i X)Ȳ = (X i X)Y i Ȳ (X i X) = (X i X)Y i Ȳ (X i )+Ȳn Xi n = (X i X)Y i Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 12

Slope as linear combination of outputs b 1 can be expressed as a linear combination of the Y i s b 1 = (Xi X)(Y i Ȳ) (Xi X) 2 = (Xi X)Y i (Xi X) 2 where = k i Y i k i = (Xi X) (Xi X) 2 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 13

Properties of the k i s It can be shown that ki = 0 ki X i = 1 k 2 i = 1 (Xi X) 2 (possible homework). We will use these properties to prove various properties of the sampling distributions of b 1 and b 0. write on board Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 14

Normality of b 1 s Sampling Distribution Useful fact: A linear combination of independent normal random variables is normally distributed More formally: when Y 1,, Y n are independent normal random variables, the linear combination a 1 Y 1 + a 2 Y 2 + + a n Y n is normally distributed, with mean a i E(Y i ) and variance a 2 iv(y i ) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 15

Normality of b 1 s Sampling Distribution Since b 1 is a linear combination of the Y i s and each Y i is an independent normal random variable, then b 1 is distributed normally as well b 1 = k i Y i, k i = (X i X) (Xi X) 2 write on board Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 16

b 1 is an unbiased estimator This can be seen using two of the properties E(b 1 ) = E( k i Y i )= k i E(Y i )= k i (β 0 + β 1 X i ) = β 0 ki + β 1 ki X i = β 0 (0)+β 1 (1) = β 1 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 17

Variance of b 1 Since the Y i are independent random variables with variance σ 2 and the k i s are constants we get V(b 1 ) = V( k i Y i )= k 2 i V(Y i) = k 2 i σ2 = σ 2 k 2 i = σ 2 1 (Xi X) 2 note that this assumes that we know σ 2. Can we? Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 18

Estimated variance of b 1 If we don t know σ 2 then we can replace it with the MSE estimate Remember s 2 = MSE= SSE n 2 = (Yi Ŷ i ) 2 n 2 = e 2 i n 2 plugging in we get V(b 1 ) = ˆV(b 1 ) = σ 2 (Xi X) 2 s 2 (Xi X) 2 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 19

Digression : Gauss-Markov Theorem In a regression model where E(ǫ i ) = 0 and variance V(ǫ i ) = σ 2 < and ǫ i and ǫ j are uncorrelated for all i and j the least squares estimators b 0 and b 1 and unbiased and have minimum variance among all unbiased linear estimators. Remember b 1 = (Xi X)(Y i Ȳ) (Xi X) 2 b 0 = Ȳ b 1 X Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 20

Proof The theorem states that b 1 as minimum variance among all unbiased linear estimators of the form ˆβ 1 = c i Y i As this estimator must be unbiased we have E(ˆβ 1 ) = c i E(Y i )=β 1 = c i (β 0 + β 1 X i )=β 0 ci + β 1 ci X i = β 1 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 21

Proof cont. Given these constraints β 0 ci + β 1 ci X i = β 1 clearly it must be the case that c i =0 and c i X i = 1 write these on board as conditions of unbiasedness The variance of this estimator is V(ˆβ 1 ) = c 2 i V(Y i )=σ 2 c 2 i Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 22

Proof cont. Now define c i = k i + d i where the k i are the constants we already defined and the d i are arbitrary constants. Let s look at the variance of the estimator V(ˆβ 1 ) = c 2 i V(Y i )=σ 2 (k i + d i ) 2 = σ 2 ( k 2 i + d 2 i +2 k i d i ) Note we just demonstrated that σ 2 k 2 i = V(b 1) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 23

Proof cont. Now by showing that k i d i = 0 we re almost done ki d i = k i (c i k i ) = k i (c i k i ) = k i c i k 2 i = c i ( Xi X (Xi X) 2 ) 1 (Xi X) 2 = ci X i X c i (Xi X) 2 1 (Xi X) 2 =0 from conditions of unbiasedness Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 24

So we are left with Proof end V(ˆβ 1 ) = σ 2 ( k 2 i + d 2 i) = V(b 1 )+σ 2 ( d 2 i ) which is minimized when the d i s = 0. This means that the least squares estimator b 1 has minimum variance among all unbiased linear estimators. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 25

Sampling Distribution of (b 1 - β 1 )/S(b 1 ) b 1 is normally distributed so (b 1 -β 1 )/(V(b 1 ) 1/2 ) is a standard normal variable We don t know V(b 1 ) so it must be estimated from data. We have already denoted it s estimate ˆV(b 1 ) Using this estimate we it can be shown that b 1 β 1 Ŝ(b 1 ) t(n 2) Ŝ(b 1)= ˆV(b1 ) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 26

Where does this come from? We need to rely upon the following theorem For the normal error regression model SSE σ 2 = (Yi Ŷ i ) 2 σ 2 χ 2 (n 2) and is independent of b 0 and b 1 Intuitively this follows the standard result for the sum of squared normal random variables Here there are two linear constraints imposed by the regression parameter estimation that each reduce the number of degrees of freedom by one. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 27

Another useful fact : t distribution Let z and χ 2 (ν) be independent random variables (standard normal and χ 2 respectively). We then define a t random variable as follows: t(ν)= z χ 2 (ν) ν This version of the t distribution has one parameter, the degrees of freedom ν Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 28

Distribution of the studentized statistic To derive the distribution of this statistic, first we do the following rewrite b1 β1 b 1 β 1 Ŝ(b 1 ) = S(b 1 ) Ŝ(b 1 ) S(b 1 ) This is a standard normal variable Ŝ(b 1 ) ˆV(b1 S(b 1 ) = ) V(b 1 ) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 29

Studentized statistic cont. And note the following (X i X) 2 MSE ˆV(b 1 ) V(b 1 ) = = MSE σ 2 σ = SSE 2 σ 2 (n 2) (X i X) 2 where we know (by the given theorem) the distribution of the last term is χ 2 and indep. of b 1 and b 0 SSE σ 2 (n 2) χ2 (n 2) n 2 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 30

Studentized statistic final But by the given definition of the t distribution we have our result b 1 β 1 Ŝ(b 1 ) t(n 2) because putting everything together we can see that b 1 β 1 Ŝ(b 1 ) z χ 2 (n 2) n 2 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 31

Confidence Intervals and Hypothesis Tests Now that we know the sampling distribution of b 1 (t with n-2 degrees of freedom) we can construct confidence intervals and hypothesis tests easily Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 32