Inference in Normal Regression Model. Dr. Frank Wood

Size: px

Start display at page:

Download "Inference in Normal Regression Model. Dr. Frank Wood"

Horatio Lamb
5 years ago
Views:

1 Inference in Normal Regression Model Dr. Frank Wood

2 Remember We know that the point estimator of b 1 is b 1 = (Xi X )(Y i Ȳ ) (Xi X ) 2 Last class we derived the sampling distribution of b 1, it being N(β 1, σ 2 {b 1 })(when σ 2 known) with σ 2 {b 1 } = σ 2 (Xi X ) 2 And we suggested that an estimate of σ 2 {b 1 } could be arrived at by substituting the MSE for σ 2 when σ 2 is unknown. s 2 {b 1 } = MSE (Xi X ) 2 = SSE n 2 (Xi X ) 2

3 Sampling Distribution of (b 1 β 1 )/s{b 1 } Since b 1 is normally distribute, (b 1 β 1 )/σ{b 1 } is a standard normal variable N(0, 1) We don t know σ 2 {b 1 } so it must be estimated from data. We have already denoted it s estimate s 2 {b 1 } Using this estimate we showed that b 1 β 1 s{b 1 } t(n 2) where s{b 1 } = s 2 {b 1 } It is from this fact that our confidence intervals and tests will derive.

4 Confidence Intervals and Hypothesis Tests Now that we know the sampling distribution of b 1 (t with n-2 degrees of freedom) we can construct confidence intervals and hypothesis tests easily

5 Confidence Interval for β 1 Since the studentized statistic follows a t distribution we can make the following probability statement P(t(α/2; n 2) b 1 β 1 s{b 1 } t(1 α/2; n 2)) = 1 α matlab: tpdf, tcdf, tinv

6 Remember Density: f (y) = df (y) dy Distribution (CDF): F (y) = P(Y y) = y f (t)dt Inverse CDF: F 1 y (p) = y s.t. f (t)dt = p

7 Book tables and Matlab commands Appendix B (or elsewhere in other books), a table of percentiles of the t distribution is given. In this table one number appears for each of a number of degrees of freedom ν and a parameter, call it A. Each entry is some value of t(a; ν) where P{t(ν) t(a; ν)} = A In words t(a; ν) is the point on the horizontal axis of the Student-t distribution where A percent of the mass under the curve is located to the left. This is precisely the quantity returned by tinv(a, ν) in Matlab. How can this be used to produce a confidence interval?

8 Interval arriving from picking α Note that by symmetry Remember t(α/2; n 2) = t(1 α/2; n 2) P(t(α/2; n 2) b 1 β 1 s{b 1 } t(1 α/2; n 2)) = 1 α Rearranging terms and using this symmetry we have P(b 1 t(1 α/2; n 2)s{b 1 } β 1 b 1 + t(1 α/2; n 2)s{b 1 }) And now we can use a table to look up and produce confidence intervals = 1 α

9 Using tables for Computing Intervals The tables in the book (table B.2 in the appendix) for t(1 α/2; ν) where P{t(ν) t(1 α/2; ν)} = A Provides the inverse CDF of the t-distribution This can be arrived at computationally as well Matlab: tinv(1 α/2, ν)

10 1 α confidence limits for β 1 The 1 α confidence limits for β 1 are b 1 ± t(1 α/2; n 2)s{b 1 } Note that this quantity can be used to calculate confidence intervals given n and α. Fixing α can guide the choice of sample size if a particular confidence interval is desired Give a sample size, vice versa. Also useful for hypothesis testing

11 Tests Concerning β 1 Example 1 Two-sided test H 0 : β 1 = 0 H a : β 1 0 Test statistic t = b1 0 s{b 1}

12 Tests Concerning β 1 We have an estimate of the sampling distribution of b 1 from the data. If the null hypothesis holds then the b 1 estimate coming from the data should be within the 95% confidence interval of the sampling distribution centered at 0 (in this case) t = b 1 0 s{b 1 } Variability in b 1 is assumed to arise from sampling noise.

13 Decision rules if t t(1 α/2; n 2), conclude H 0 if t > t(1 α/2; n 2), conclude H α Absolute values make the test two-sided

14 Intuition p-value is value of α that moves the green line to the blue line

15 Calculating the p-value The p-value, or attained significance level, is the smallest level of significance α for which the observed data indicate that the null hypothesis should be rejected. This can be looked up using the CDF of the test statistic. In Matlab Two-sided p-value 2 (1 tcdf ( t, ν))

16 Inferences Concerning β 0 Largely, inference procedures regarding β 0 can be performed in the same way as those for β 1 Remember the point estimator b 0 for β 0 b 0 = Ȳ b 1 X

17 Sampling distribution of b 0 The sampling distribution of b 0 refers to the different values of b 0 that would be obtained with repeated sampling when the levels of the predictor variable X are held constant from sample to sample. For the normal regression model the sampling distribution of b 0 is normal

18 Sampling distribution of b 0 When error variance is known E{b 0 } = β 0 σ 2 {b 0 } = σ 2 ( 1 n + X 2 (Xi X ) 2 ) When error variance is unknown s 2 {b 0 } = MSE( 1 n + X 2 (Xi X ) 2 )

19 Confidence interval for β 0 The 1 α confidence limits for β 0 are obtained in the same manner as those for β 1 b 0 ± t(1 α/2; n 2)s{b 0 }

20 Considerations on Inferences on β 0 and β 1 Effects of departures from normality The estimators of β0 and β 1 have the property of asymptotic normality - their distributions approach normality as the sample size increases (under general conditions) Spacing of the X levels The variances of b0 and b 1 (for a given n and σ 2 ) depend strongly on the spacing of X

21 Sampling distribution of point estimator of mean response Let X h be the level of X for which we would like an estimate of the mean response Needs to be one of the observed X s The mean response when X = X h is denoted by E{Y h } The point estimator of E{Y h } is Ŷ h = b 0 + b 1 X h We are interested in the sampling distribution of this quantity

22 Sampling Distribution of Ŷh We have Ŷ h = b 0 + b 1 X h Since this quantity is itself a linear combination of the Y i s it s sampling distribution is itself normal. The mean of the sampling distribution is Biased or unbiased? E{Ŷh} = E{b 0 } + E{b 1 }X h = β 0 + β 1 X h

23 Sampling Distribution of Ŷh To derive the sampling distribution variance of the mean response we first show that b 1 and (1/n) Y i are uncorrelated and, hence, for the normal error regression model independent We start with the definitions Ȳ = ( 1 n )Y i b 1 = k i Y i, k i = (X i X ) (Xi X ) 2

24 Sampling Distribution of Ŷh We want to show that mean response and the estimate b 1 are uncorrelated Cov(Ȳ, b 1) = σ 2 {Ȳ, b 1} = 0 To do this we need the following result (A.32) n n σ 2 { a i Y i, c i Y i } = i=1 i=1 n a i c i σ 2 {Y i } i=1 when the Y i are independent

25 Sampling Distribution of Ŷh Using this fact we have σ 2 { n i=1 1 n Y i, n k i Y i } = i=1 So the Ȳ and b 1 are uncorrelated = n i=1 n i=1 = σ2 n = 0 1 n k iσ 2 {Y i } 1 n k iσ 2 n i=1 k i

26 Sampling Distribution of Ŷh This means that we can write down the variance σ 2 {Ŷ h } = σ 2 {Ȳ + b 1 (X h X )} alternative and equivalent form of regression function But we know that the mean of Y and b 1 are uncorrelated so σ 2 {Ŷ h } = σ 2 {Ȳ } + σ 2 {b 1 }(X h X ) 2

27 Sampling Distribution of Ŷh We know (from last lecture) σ 2 {b 1 } = s 2 {b 1 } = σ 2 (Xi X ) 2 MSE (Xi X ) 2 And we can find σ 2 {Ȳ } = 1 n 2 σ 2 {Y i } = nσ2 n 2 = σ2 n

28 Sampling Distribution of Ŷh So, plugging in, we get σ 2 {Ŷ h } = σ2 n + σ 2 (Xi X ) 2 (X h X ) 2 Or ( 1 σ 2 {Ŷ h } = σ 2 n + (X h X ) 2 ) (Xi X ) 2

29 Sampling Distribution of Ŷh Since we often won t know σ 2 we can, as usual, plug in S 2 = SSE/(n 2), our estimate for it to get our estimate of this sampling distribution variance ( 1 s 2 {Ŷ h } = S 2 n + (X h X ) 2 ) (Xi X ) 2

30 No surprise... The sampling distribution of our point estimator for the output is distributed as a t-distribution with two degrees of freedom Ŷ h E{Y h } t(n 2) s{ŷ h } This means that we can construct confidence intervals in the same manner as before.

31 Confidence Intervals for E{Y h } The 1 α confidence intervals for E{Y h } are Ŷ h ± t(1 α/2; n 2)s{Ŷ h } From this hypothesis tests can be constructed as usual.

32 Comments The variance of the estimator Ŷ h is smallest near the mean of X. Designing studies such that the mean of X is near X h will improve inference precision When X h is zero the variance of the estimator Ŷh reduces to the variance of the estimator b 0 for β 0

33 Prediction interval for new input X h Roughly the same idea as for E{Y h } where X h is a known input point included in the estimation of b 1, b 0, and s 2 If all regression parameters are known then the 1 α prediction interval for a new observation Y h is E{Y h } ± z(1 α/2)σ

34 Prediction interval for new input X h If the regression parameters are unknown the 1 α prediction interval for a new observation Y h(new) is given by the following theorem Y h(new) Ŷ h t(n 2) s{pred} for the normal error regression model. s{pred} to be defined shortly. It follows directly that the 1 α prediction limits for Y h(new) are Ŷ h ± t(1 α/2; n 2)s{pred} This is very nearly the same as prediction for a known value of X but includes a correction for the fact that there is additional variability arising from the fact that the new input location was not used in the original estimates of b 1, b 0, and s 2

35 Prediction interval for new input X h Because Y h(new) is independent of Ŷ h we can directly write σ 2 {pred} = σ 2 {Y h(new) Ŷh} = σ 2 {Y h(new) }+σ 2 {Ŷh} = σ 2 +σ 2 {Ŷh} where from before we have that ( 1 σ 2 {Ŷh} = σ 2 n + (X h X ) 2 ) (Xi X ) 2 so [ σ 2 {pred} = σ n + (X h X ) 2 ] (Xi X ) 2 but as before we don t know σ 2 so we will replace it...

36 Prediction interval for new input X h The value of s 2 {pred} is given by s 2 {pred} = MSE [1 + 1n + (X h X ) 2 ] (Xi X ) 2 Note that this quantity is slightly larger than s 2 {Ŷh}. It has two components The variance of the distribution of y at X = X h, namely σ 2 The variance of the sampling distribution of Ŷ h, namely s 2 {Ŷ h }.

37 Summary After this lecture you should be able to confidently do estimation, prediction, and hypothesis testing about the slope, intercept, and predicted values at any input point, old or new in the normal error linear regression setting.

Bias Variance Trade-off

Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]