Chapter 2 Inferences in Regression and Correlation Analysis

Size: px

Start display at page:

Download "Chapter 2 Inferences in Regression and Correlation Analysis"

Moris Dickerson
5 years ago
Views:

1 Chapter 2 Inferences in Regression and Correlation Analysis 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 2 1 / 102

2 Inferences concerning the regression parameters β 0 and β 1 interval estimation of β 0 and β 1 tests about them interval estimation of E(Y ) of the probability distribution of Y for given X prediction intervals of a new observation Y confidence bands for the regression line the analysis of variance approach to regression analysis the general linear test approach descriptive measure of association the correlation coefficient hsuhl (NUK) LR Chap 2 2 / 102

3 Normal error regression model Assume that the normal error regression model is applicable: Y i = β 0 + β 1 X i + ε i β 0 and β 1 are parameters X i are known constants ε i N (0, σ 2 ): are independent Y i and Y j i j are uncorrelated. hsuhl (NUK) LR Chap 2 3 / 102

4 Inferences Concerning β 1 Inferences about the slope β 1 of the regression line Illustration A market research analyst studying the relation between sales (Y ) and advertising expenditures ( 廣告支出 X): to obtain an interval estimate of β 1 provide information as to how many additional sales dollars 檢定 H 0 : β 1 = 0; H a : β 1 0. hsuhl (NUK) LR Chap 2 4 / 102

5 Inferences Concerning β 1 Inferences about the slope β 1 of the regression line (cont.) Figure 1 : Regression Model when β 1 = 0. β 1 = 0 no linear association between Y and X The regression line is horizontal. The means of Y : E{Y } = β 0. The probability distribution of Y are identical at all levels of X. hsuhl (NUK) LR Chap 2 5 / 102

Sampling Distribution of b 1 Point estimator b 1 b 1 = (Xi X)(Y i Ȳ ) (Xi X) 2 The sample distribution of b 1 refers to the different values of b 1 that would be obtained with repeated

6 Sampling Distribution of b 1 Point estimator b 1 b 1 = (Xi X)(Y i Ȳ ) (Xi X) 2 The sample distribution of b 1 refers to the different values of b 1 that would be obtained with repeated sampling when the levels of the predictor variable X are held constant from sample to sample. Model: Y i = 5 + 2X i + ε i ε i i.i.d. N (0, 1) X i { 5, 4.9,..., 4.9, 5} hsuhl (NUK) LR Chap 2 6 / 102

7 Sampling Distribution of b 1 Point estimator b 1 (cont.) Sampling distribution of b 1 ( 需 k i 的特性 ) For normal error regression model: Mean: E{b 1 } = β 1 ; Variance: σ 2 {b 1 } = σ 2 (Xi X) 2 b 1 is a linear combination of Y i b 1 = k i Y i k i = X i X (Xi X) 2 : a function of X i hsuhl (NUK) LR Chap 2 7 / 102

8 Sampling Distribution of b 1 Point estimator b 1 (cont.) Properties of k i : ki = 0 ki X i = 1 k 2 1 i = (Xi X) 2 hsuhl (NUK) LR Chap 2 8 / 102

9 Sampling Distribution of b 1 Normality Y i : independently, normally distributed A linear combination of independent normal random variables is normally distributed. hsuhl (NUK) LR Chap 2 9 / 102

10 Normality (cont.) Sampling Distribution of b 1 Normally properties Mean: E{b 1 } = β 1 Variance: σ 2 {b 1 } = σ 2 1 (Xi X) 2 hsuhl (NUK) LR Chap 2 10 / 102

11 Sampling Distribution of b 1 Normality Normally properties (cont.) The unbiased estimator of σ 2 : MSE = SSE n 2 = Estimated Variance σ 2 {b 1 }: s 2 {b 1 } = (Yi Ŷi) 2 n 2 MSE (Xi X) 2 The point estimator s 2 {b 1 } is an unbiased estimator of σ 2 {b 1 }. The point estimator of σ{b 1 }: s{b 1 } hsuhl (NUK) LR Chap 2 11 / 102

12 Sampling Distribution of b 1 b 1 : unbiased linear estimator Theorem 1 The estimator b 1 has minimum variance among all unbiased linear estimators of: ˆβ 1 = c i Y i c i : arbitrary constants; c i = 0; c i X i = 1 Unbiased: E{ ˆβ 1 } = β 1 σ 2 { ˆβ 1 } = σ 2 ci 2 c i = k i + d i k i = X i X (Xi X) 2 d i : arbitrary constants ki d i = 0 σ 2 k 2 i = σ 2 {b 1 } ˆβ 1 is at a minimum when di 2 = 0 all d i = 0 c i = k i hsuhl (NUK) LR Chap 2 12 / 102

13 Sampling distribution of (b 1 β 1 )/s{b 1 } Sampling distribution of (b 1 β 1 )/s{b 1 } Theorem 2 b 1 β 1 s{b 1 } t n 2 If the observations Y i come from the same normal population, (Ȳ µ)/s{ȳ } d t n 1 Two parameters: β 0, β 1 hsuhl (NUK) LR Chap 2 13 / 102

14 Sampling distribution of (b 1 β 1 )/s{b 1 } Sampling distribution of (b 1 β 1 )/s{b 1 } (cont.) Proof of b 1 β 1 s{b 1 } t n 2 Theorem 3 For regression model (2.1), and is independent of b 0 and b 1. SSE/σ 2 χ 2 (n 2), Remark: 利用矩陣型式可較容易推得 SSE/σ 2 χ 2 (n 2) is independent of b 0 and b 1 Key: Lσ 2 ya = (X X) 1 X σi [(I H )/σ 2 ] = 0 hsuhl (NUK) LR Chap 2 14 / 102

15 Sampling distribution of (b 1 β 1 )/s{b 1 } Sampling distribution of (b 1 β 1 )/s{b 1 } (cont.) (cont.) Theorem 4 t Distribution Let Z and χ 2 (ν) be independent r.v. (standard normal, χ 2 ). A t random variable as follows: t(ν) = Z [ χ 2 (ν) ν ] where Z and χ 2 (ν) are indep. hsuhl (NUK) LR Chap 2 15 / 102

16 Sampling distribution of (b 1 β 1 )/s{b 1 } Sampling distribution of (b 1 β 1 )/s{b 1 } (cont.) (cont.) 1 (b 1 β 1 )/σ{b 1 } Z (Standard Normal variable) 2 s 2 {b 1 } χ2 (n 2) σ 2 {b 1 } n 2 3 b 1 β 1 s{b 1 } = b 1 β 1 σ{b 1 } s{b 1} σ{b 1 } Z and χ 2 are independent; 4 Z is a function of b 1 5 b 1 is independent of SSE/σ 2 χ 2 Z χ 2 (n 2) n 2 b 1 β 1 s{b 1 } t n 2 hsuhl (NUK) LR Chap 2 16 / 102

17 Confidence interval for β 1 Confidence interval for β 1 b 1 β 1 s{b 1 } t n 2 { } P t(α/2; n 2) (b 1 β 1 )/s{b 1 } t(1 α/2; n 2) = 1 α { } P b 1 t(1 α/2; n 2)s{b 1 } β 1 b 1 + t(1 α/2; n 2)s{b 1 } = 1 α t(α/2; n 2): (α/2)100 percentile of the t distribution with n 2 d.f. Symmetric: t(α/2; n 2) = t(1 α/2; n 2) hsuhl (NUK) LR Chap 2 17 / 102

18 Confidence interval for β 1 Confidence interval for β 1 (cont.) The 1 α confidence limits for β 1 are: b 1 ± t(1 α/2; n 2)s{b 1 } hsuhl (NUK) LR Chap 2 18 / 102

19 Confidence interval for β 1 Ex: Confidence interval for β 1 The Toluca Company: an estimate of β 1 with 95 percent confidence coefficient. #### Example p46 toluca<-read.table("toluca.txt",header=t) attach(toluca) ## method 1 n<-length(size) alpha<-0.05 tdf<-qt(1-alpha/2,n-2) Sxx<-sum((Size-mean(Size))^2) b1<-sum((size-mean(size))*(hrs-mean(hrs)))/sxx b0<-mean(hrs)-b1*mean(size) c(b0,b1) SSE<-sum((Hrs-(b0+b1*Size))^2) MSE<-SSE/(n-2) sb1<-sqrt(mse/sum((size-mean(size))^2)) #s{b1} conf<-c(b1-tdf*sb1,b1+tdf*sb1) conf ## method 2 fitreg<-lm(hrs~size) summary(fitreg) confint(fitreg) hsuhl (NUK) LR Chap 2 19 / 102

20 Confidence interval for β 1 Ex: Confidence interval for β 1 (cont.) s 2 {b 1 } = s{b 1 } = MSE (Xi X) = t(0.975, 23) = The 95 percent confidence interval: 2.85 β hsuhl (NUK) LR Chap 2 20 / 102

21 Confidence interval for β 1 Ex: Confidence interval for β 1 (cont.) hsuhl (NUK) LR Chap 2 21 / 102

22 Confidence interval for β 1 Tests concerning β 1 b 1 β 1 s{b 1 } d t n 2 Two-Sided Test H 0 : β 1 = 0 vs. H a : β 1 0 The test statistic: t = b 1 s{b 1 } The decision rule (the level of significance α): If t t(1 α/2; n 2), conclude H 0 If t > t(1 α/2; n 2), conclude H a hsuhl (NUK) LR Chap 2 22 / 102

23 Confidence interval for β 1 Tests concerning β 1 α = 0.05; n = 25 t(0.975; 23) = b 1 = t = / = > conclude H a : β 1 0 { p-value: P t(23) > t = 10.29} < hsuhl (NUK) LR Chap 2 23 / 102

Confidence interval for β 1 Tests concerning β 1 α = 0.05; n = 25 t(0.975; 23) = 2.069 b 1 = 3.5702 t = 3.5702/0.

24 Confidence interval for β 1 Tests concerning β 1 α = 0.05; n = 25 t(0.975; 23) = b 1 = t = / = > conclude H a : β 1 0 { p-value: P t(23) > t = 10.29} < hsuhl (NUK) LR Chap 2 23 / 102

25 Confidence interval for β 1 Tests concerning β 1 One-Sided Test H 0 : β 1 0 vs. H a : β 1 > 0 The test statistic: t = b 1 s{b 1 } The decision rule (the level of significance α): If t t(1 α; n 2), conclude H 0 If t > t(1 α; n 2), conclude H a hsuhl (NUK) LR Chap 2 24 / 102

26 Confidence interval for β 1 Tests concerning β 1 Two-Sided Test H 0 : β 1 = β 10 vs. H a : β 1 β 10 The test statistic: t = b 1 β 10 s{b 1 } The decision rule (the level of significance α): If t t(1 α/2; n 2), conclude H 0 If t > t(1 α/2; n 2), conclude H a hsuhl (NUK) LR Chap 2 25 / 102

27 Inferences concerning β 0 Sampling distribution of b 0 When the scope of the model includes X = 0 The point estimator b 0 : b 0 = Ȳ b 1 X Theorem 5 For regression model (2.1), the sampling distribution of b 0 is normal, with mean and variance: E{b 0 } = β 0 σ 2 {b 0 } = σ 2 { 1 n + X 2 (Xi X) 2 } Cov(Ȳ, b 1) = 0 hsuhl (NUK) LR Chap 2 26 / 102

28 Inferences concerning β 0 Sampling distribution of b 0 (cont.) An estimator of σ 2 {b 0 }: [ 1 s 2 2 ] {b 0 } = MSE n + X (Xi X) 2 An estimator of σ{b 0 }: s{b 0 } Theorem 6 b 0 β 0 s{b 0 } d t n 2 hsuhl (NUK) LR Chap 2 27 / 102

29 Confidence interval for β 0 Confidence interval for β 0 The 1 α confidence limits for β 0 are: b 0 ± t(1 α/2; n 2)s{b 0 } ## method 1 sb0<-sqrt(mse*(1/n+mean(size)^2/sum((size-mean(size))^2))) #s{b0} b0<-mean(hrs)-b1*mean(size) tdf0<-qt(1-0.1/2,n-2) conf0<-c(b0-tdf0*sb0,b0+tdf0*sb0) conf0 ## method 2 fitreg<-lm(hrs~size) confint(fitreg,level = 0.9) hsuhl (NUK) LR Chap 2 28 / 102

30 Confidence interval for β 0 Example C.I. for β 0 Toluca Company Range of X: [20,120] t(0.95; 23) = [ ] s 2 1 {b 0 } = MSE + X 2 n (Xi X) = s{b 0 } = The 90% C.I. for β 0 : 17.5 β Remark: The C.I. does not necessarily provide meaningful information. (The scope of the model doesn t include X = 0 ) hsuhl (NUK) LR Chap 2 29 / 102

31 Confidence interval for β 0 Example C.I. for β 0 (cont.) It does not necessarily provide information about the setup cost since we are not certain whether a linear regression model is appropriate when the scope of the model is extended to X = 0 hsuhl (NUK) LR Chap 2 30 / 102

32 Confidence interval for β 0 Example C.I. for β 0 (cont.) > toluca<-read.table("toluca.txt",header=t) > fitreg<-lm(hrs~size,data=toluca) > summary(fitreg) R codes Call: lm(formula = Hrs ~ Size, data = toluca) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * Size e-10 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 23 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 23 DF, p-value: 4.449e-10 > confint(fitreg,level = 0.95) 2.5 % 97.5 % (Intercept) Size hsuhl (NUK) LR Chap 2 31 / 102

33 Confidence interval for β 0 Example C.I. for β 0 (cont.) R codes > ## Plot regression line > plot(toluca$size,toluca$hrs,pch=16,xlab="size",ylab="hours") > abline(fitreg,col="red") hsuhl (NUK) LR Chap 2 32 / 102

Some considerations on making inferences concerning β 0 and β 1 Departures from Normality If the probability distribution of Y are not exactly normal but do not depart seriously, the sampling

34 Some considerations on making inferences concerning β 0 and β 1 Departures from Normality If the probability distribution of Y are not exactly normal but do not depart seriously, the sampling distributions of b 0 and b 1 will approximately normal, and the use of the t distribution will provide approximately the specified confidence coefficient or level of significance. hsuhl (NUK) LR Chap 2 33 / 102

35 Some considerations on making inferences concerning β 0 and β 1 Departures from Normality (cont.) Even if the distribution of Y are far from normal, the estimators b 0 and b 1 generally have the property of asymptotic normality- their distributions approach normality under very general conditions as the sample size increases. hsuhl (NUK) LR Chap 2 34 / 102

36 Some considerations on making inferences concerning β 0 and β 1 Power of Tests H 0 : β 1 = β 10 vs. H a : β 1 β 10 Test statistic: t = b 1 β 10 s{b 1 } The power of this test for α level: the decision rule will lead to conclusion H a when H a holds { } Power = P t > t(1 α/2; n 2) δ the noncentrality measure i.e., a measure of how far the true value of β 1 is from β 10 δ = β 1 β 10 (Appendix Table B.5) σ{b 1 } hsuhl (NUK) LR Chap 2 35 / 102

37 Some considerations on making inferences concerning β 0 and β 1 Common objective To estimate the mean for one or more probability distributions of Y. Illustration A study of the relation between level of piecework( 按件計酬的工作 ) pay (X) and worker productivity ( 生產力 Y ). The mean productivity at high and medium levels of piecework pay may be of particular interest for purposes of analyzing the benefits obtained from an increase in the pay hsuhl (NUK) LR Chap 2 36 / 102

38 Some considerations on making inferences concerning β 0 and β 1 X h : the level of X for which we wish to estimate the mean response may be a value which occurred in the sample other value of the predictor variable within the scope( 範圍 ) of the model E{Y h }: the mean response when X = X h Ŷ h : the point estimator of E{Y h }: Ŷ h = b 0 + b 1 X h What is the sampling distribution of Ŷh? hsuhl (NUK) LR Chap 2 37 / 102

39 Some considerations on making inferences concerning β 0 and β 1 Sampling distribution of Ŷh Sampling distribution of Ŷh For normal error regression model (2,1), the sampling distribution of Ŷh is normal: Mean: E{Ŷh} = E{Y h } Variance: σ 2 {Ŷh} = σ 2 [ 1 n + (X h X) 2 (Xi X) 2 ] Ŷ h is a linear combination of the observations Y i. Ŷ h is an unbiased estimator of E{Y h }. hsuhl (NUK) LR Chap 2 38 / 102

40 Some considerations on making inferences concerning β 0 and β 1 Sampling distribution of Ŷh (cont.) Figure 2 : Effect on Ŷh of Variation in b 1 from Sample to Sample in Two Samples with Same Means Ȳ and X hsuhl (NUK) LR Chap 2 39 / 102

41 Some considerations on making inferences concerning β 0 and β 1 Properties of Ŷh X h = 0: Var(Ŷh) = Var(b 0 ) s 2 {Ŷh} = s 2 {b 0 } b 1 and Ȳ are uncorrelated σ{ȳ, b 1} = 0 hsuhl (NUK) LR Chap 2 40 / 102

42 Some considerations on making inferences concerning β 0 and β 1 Properties of Ŷh (cont.) When MSE is substituted for σ 2, the estimated variance of Ŷ h (s 2 (Ŷh)) [ 1 s 2 {Ŷh} = MSE n + (X h X) 2 ] (Xi X) 2 The estimated standard deviation of Ŷh: s{ŷh} hsuhl (NUK) LR Chap 2 41 / 102

43 Interval Estimation of E{Y h } Sampling Distribution of (Ŷh E{Y h })/s{ŷh} Theorem 7 Ŷ h E{Y h } s{ŷh} t n 2 The 1 α confidence limits for E{Y h } are: Ŷ h ± t(1 α/2; n 2)s{Ŷh} hsuhl (NUK) LR Chap 2 42 / 102

44 Interval Estimation of E{Y h } Example 1 Toluca Company example find a 90 percent confidence interval for E{Y h } when the lot size is X h = 65 units. Ŷ h = (65) = ] s 2 {Ŷh} = 2, 384[ ( )2 19,800 = s{ŷh} = t(.95; 23) = E{Y h } Conclude: the mean number of work hours required when lots of 65 units are produced is somewhere between and hours. the estimate of the mean number of work hours is moderately precise. hsuhl (NUK) LR Chap 2 43 / 102

45 Interval Estimation of E{Y h } hsuhl (NUK) LR Chap 2 44 / 102

46 Interval Estimation of E{Y h } Code for Example 1 #### Example p54 toluca<-read.table("toluca.txt",header=t) attach(toluca) ## method 1 n<-length(size) alpha<-0.1 tdf90<-qt(1-alpha/2,n-2) Sxx<-sum((Size-mean(Size))^2) b1<-sum((size-mean(size))*(hrs-mean(hrs)))/sxx b0<-mean(hrs)-b1*mean(size) c(b0,b1) SSE<-sum((Hrs-(b0+b1*Size))^2) MSE<-SSE/(n-2) sb1<-sqrt(mse/sum((size-mean(size))^2)) #s{b1} Xh<-65 hyh<-b0+b1* Xh syh<-sqrt(mse*(1/n+(xh-mean(size))^2/sum((size-mean(size))^2))) hsuhl (NUK) LR Chap 2 45 / 102

47 Interval Estimation of E{Y h } Code for Example 1 (cont.) EYhCI<-c(hYh-tdf90*sYh,hYh+tdf90*sYh) ## method 2 fitreg<-lm(hrs~size,data=toluca) summary(fitreg) predxci<-predict(fitreg,data.frame(size = c(xh)), interval = "confidence", se.fit = F,level = 0.9) plot(size, Hrs) abline(fitreg,col="blue") # now the confidence interval for $X_h=specific level$ points(xh, predxci[, "fit"],col="red",pch=15) points(xh, predxci[, "lwr"], lty = "dotted",col="red") points(xh, predxci[, "upr"], lty = "dotted",col="red") hsuhl (NUK) LR Chap 2 46 / 102

48 Interval Estimation of E{Y h } Example 2 Toluca Company example Estimate E{Y h } for lots with X h = 100 units with a 90 percent confidence interval Ŷ h = (100) = ] s 2 {Ŷh} = 2, 384[ ( )2 19,800 = s{ŷh} = t(.95; 23) = E{Y h } Conclude: the confidence interval is somewhat wider than that for Example 1, since the X h level here is substantially farther from the mean X = 70 than the X h = 65. hsuhl (NUK) LR Chap 2 47 / 102

49 Interval Estimation of E{Y h } hsuhl (NUK) LR Chap 2 48 / 102

50 Interval Estimation of E{Y h } Sampling distribution of Ŷh The variance of Ŷh is smallest when X h = X. In an experiment to estimate the mean response at a particular level X h of the predictor variable, the precision of the estimate will be greatest if the observations on X are spaced so that X = X h. The usual relationship between C.I. and tests applies in inferences concerning the mean response. The two-sided confidence limits can be utilized for two-sided tests concerning the mean response at X h. Alternatively, a regular decision rule can be set up. The confidence limits for a mean response E{Y h } are not sensitive to moderate departures from the assumption that the error terms are normally distributed. hsuhl (NUK) LR Chap 2 49 / 102

51 Prediction of New Observation Prediction of New Observation Toluca Company: the next lot to be produced consists of 100 units; wishes to predict the number of work hours for this particular lot. Estimated the regression relation between company sales and number of persons 16 or more years old from data for the past 10 years; wishes to predict next year s company sales hsuhl (NUK) LR Chap 2 50 / 102

52 Prediction of New Observation Prediction of New Observation College admissions example the relevant parameters of the regression model are known: β 0 = 0.10 β 1 = 0.95 E{Y } = X σ = 0.12 An applicant whose high school GPA is X h = 3.5: E{Y h } = (3.5) = E{Y h } ± 3σ: ± 3(0.12) Y h(new) hsuhl (NUK) LR Chap 2 51 / 102

53 Prediction of New Observation The basic idea of a prediction interval is to choose a range in the distribution of Y wherein most of the observations will fall, and then to declare that the next observation will fall in this range. When the regression parameters of normal error regression model (2.1) are known, the 1 α prediction limits for Y h(new) are: E{Y h } ± z(1 α/2)σ hsuhl (NUK) LR Chap 2 52 / 102

54 Prediction of New Observation Prediction Interval for Y h(new) when Parameters Unknown Figure 3 : Unknown Figure 2.5: Prediction of Y h(new) when Parameters hsuhl (NUK) LR Chap 2 53 / 102

55 Prediction of New Observation Prediction Interval for Y h(new) when Parameters Unknown (cont.) Since we cannot be certain of the location of the distribution of Y, prediction limits for Y h(new) clearly must take account of two elements (Figure 2.5) Variation in possible location of the distribution of Y Variation within the probability distribution of Y Prediction limits for a new observation Y h(new) at X h (given) are obtained: Theorem 8 Y h(new) Ŷh s{pred} t n 2 for normal error regression model (2.1) hsuhl (NUK) LR Chap 2 54 / 102

56 Prediction of New Observation Prediction Interval for Y h(new) when Parameters Unknown (cont.) The 1 α prediction limits for Y h(new) : Ŷ h ± t(1 α/2; n 2)s{pred} The difference may be viewed as the prediction error, with Ŷ h(new) serving as the best point estimate of the value of the new observation Y h(new) σ 2 {pred}: the variance of the prediction error σ 2 {pred} = σ 2 {Y h(new) Ŷh} = σ 2 + σ 2 {Ŷh} hsuhl (NUK) LR Chap 2 55 / 102

57 Prediction of New Observation Prediction Interval for Y h(new) when Parameters Unknown (cont.) σ 2 {pred} has two components: The variance of the distribution of Y at X = X h ; σ 2 The variance of the sampling distribution of Ŷh; σ 2 {Ŷh} An unbiased estimator of σ 2 {pred}: s 2 {pred} = MSE + s 2 {Ŷh} = MSE [1 + 1 n + (X h X) 2 ] (Xi X) 2 hsuhl (NUK) LR Chap 2 56 / 102

58 Prediction of New Observation Example Toluca Company: X h = percent prediction interval: t(0.95; 23) = Ŷ h = s 2 {Ŷh} = MSE = 2, 384 s 2 {pred} = 2, = 2, s{pred} = The 90 percent prediction interval for Y h(new) : Y h(new) hsuhl (NUK) LR Chap 2 57 / 102

59 Prediction of New Observation ## Example (p59) fitreg<-lm(hrs~size,data=toluca) Xh<-100 fitnew<-predict.lm(fitreg,data.frame(size = c(xh)), se.fit = T, level = 0.9) s2pred<-fitnew$se.fit^2+fitnew$residual.scale^2 c(s2pred,sqrt(s2pred)) hsuhl (NUK) LR Chap 2 58 / 102

60 Prediction of New Observation This prediction interval is rather wide and may not be useful for planning worker requirements for the next lot. The interval can still be useful for control purposes. If the actual work hours fall outside the prediction limits some alerts- may have occurred a change in the production process hsuhl (NUK) LR Chap 2 59 / 102

61 Prediction of New Observation Comparing Y h(new) and E{Y h } Toluca Company: The C.I. for Y h(new) is wider than the C.I. for E{Y h }: predicting the work hours required for a new lot, encounter the variability in Ŷh from sample to sample as well as the lot-to-lot variation within the probability distribution of Y (2.38a): The prediction interval is wider the further X h is from X hsuhl (NUK) LR Chap 2 60 / 102

62 Prediction of New Observation Prediction of Mean of m New Observations for Given X h Predict the mean of m new observations on Y for a given X h Y : the mean of the new observations to be predicted as Ȳ h(new) the new Y observations are independent The appropriate 1 α prediction limits: Ŷ h ± t(1 α/2; n 2)s{predmean} s 2 {predmean} = MSE m + s2 {Ŷh} [ 1 s 2 {predmean} = MSE m + 1 n + (X h X) 2 ] (Xi X) 2 hsuhl (NUK) LR Chap 2 61 / 102

63 Prediction of New Observation Prediction of Mean of m New Observations for Given X h (cont.) Two components for s 2 {predmean} the variance of the mean of m observations from the probability distribution of Y at X = X h The variance of the sampling distribution of Ŷh hsuhl (NUK) LR Chap 2 62 / 102

64 Prediction of New Observation Prediction of Mean of m New Observations for Given X h (cont.) Example Toluca Company: X h = percent prediction interval for the mean number of work hours Ȳh(new) in three new production runs Previous results: Ŷ h = 419.4; s 2 {Y h } = MSE = 2, 384; t(0.95; 23) = 1.714; s 2 2, 384 {predmean} = = s{predmean} = hsuhl (NUK) LR Chap 2 63 / 102

65 Prediction of New Observation Prediction of Mean of m New Observations for Given X h (cont.) The prediction interval for Ȳh(new): Ȳh(new) The total number of work hours: 1, Total work hours 1, hsuhl (NUK) LR Chap 2 64 / 102

in range(x) Xb is the orthogonal projection of Y onto range(x) X (Y Xb) =

66 Geometric interpretation Geometric interpretation Y is a vector in R n ; The columns of X define a vector space range(x) X b is an arbitrary vector in range(x) Xb is the orthogonal projection of Y onto range(x) X (Y Xb) = 0 Figure 4 : Geometric interpretation of OLS hsuhl (NUK) LR Chap 2 65 / 102

67 Geometric interpretation Geometric interpretation Y is a vector in R n ; The columns of X define a vector space range(x) X b is an arbitrary vector in range(x) Xb is the orthogonal projection of Y onto range(x) X (Y Xb) = 0 Figure 4 : Geometric interpretation of OLS hsuhl (NUK) LR Chap 2 65 / 102

68 Geometric interpretation Geometric interpretation Y is a vector in R n ; The columns of X define a vector space range(x) X b is an arbitrary vector in range(x) Xb is the orthogonal projection of Y onto range(x) X (Y Xb) = 0 Figure 4 : Geometric interpretation of OLS hsuhl (NUK) LR Chap 2 65 / 102

69 Geometric interpretation Geometric interpretation Y is a vector in R n ; The columns of X define a vector space range(x) X b is an arbitrary vector in range(x) Xb is the orthogonal projection of Y onto range(x) X (Y Xb) = 0 Figure 4 : Geometric interpretation of OLS hsuhl (NUK) LR Chap 2 65 / 102

70 Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares The analysis of variance approach is based on the partitioning of sums of squares and degrees of freedom associated with Y. The variation is measured: the deviations of the Y i around their mean Ȳ : Y i Ȳ hsuhl (NUK) LR Chap 2 66 / 102

71 Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) hsuhl (NUK) LR Chap 2 67 / 102

72 Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) Total variation: (SSTO: total sum of squares) SSTO = (Y i Ȳ )2 Y i are the same SSTO = 0 The greater the variation among the Y i, the larger is SSTO. SSE: error sum of squares SSE = (Y i Ŷi) 2 Y i fall on the fitted regression line SSE = 0 The greater the variation of the Y i around the fitted regression line, the larger is SSE. hsuhl (NUK) LR Chap 2 68 / 102

73 Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) SSR: regression sum of squares SSR = (Ŷi Ȳ )2 The regression line is horizontal SSR = 0, otherwise SSR > 0 a measure associated with the regression line The larger SSR is in relation to SSTO, the greater is the effect of the regression relation in accounting for the total variation in the Y i observations. hsuhl (NUK) LR Chap 2 69 / 102

74 Analysis of Variance Approach to Regression Analysis Formal Development of Partitioning The total deviation: Y i Ȳ }{{} Total deviation Two components: = Ŷ i Ȳ }{{} Deviation of fitted regression value around mean + Y i Ŷi }{{} Deviation around fitted regression line The deviation of the fitted value Ŷi around the mean Ȳ. The deviation of the observation Y i around the fitted regression line. hsuhl (NUK) LR Chap 2 70 / 102

75 Analysis of Variance Approach to Regression Analysis Formal Development of Partitioning (cont.) (Yi Ȳ )2 = (Ŷi Ȳ )2 + (Y i Ŷ )2 SSTO SSR SSE 2 (Ŷi Ȳ )(Y i Ŷi) = 0 SSR = b1 2 (Xi X) 2 Degrees of freedom: (df) SS df explanation SSTO n 1 ( (Y i Ȳ ) = 0) SSR 1 SSE n 2 ( β 0, β 1 in Ŷi) hsuhl (NUK) LR Chap 2 71 / 102

76 Analysis of Variance Approach to Regression Analysis Mean Squares Mean square (MS): A sum of squares divided by its associated df SS df MS SSR 1 MSR = SSR (regression mean square) 1 SSE n 2 MSE = SSE (error mean square) n 2 Mean squares are not additive. SSTO n 1 MSR + MSE hsuhl (NUK) LR Chap 2 72 / 102

77 Analysis of Variance Approach to Regression Analysis ANOVA table Table 1 : ANOVA Table for Simple Linear Regression Source of SS df MS E{MS} Variation Regression SSR = (Ŷi Ȳ )2 1 MSR = SSR σ 2 + β (Xi X) 2 Error SSE = (Y i Ŷi) 2 n 2 MSE = SSE σ 2 n 2 Total SSTO = (Y i Ȳ )2 n 1 hsuhl (NUK) LR Chap 2 73 / 102

78 Analysis of Variance Approach to Regression Analysis ANOVA table (cont.) Modified ANOVA table: SSTO = (Y i Ȳ )2 = Y 2 i nȳ 2 SSTOU: total uncorrected sum of squares: SSTOU = Y 2 i correction for the mean sum of squares: SS(correction for mean) = nȳ 2 hsuhl (NUK) LR Chap 2 74 / 102

79 Analysis of Variance Approach to Regression Analysis ANOVA table (cont.) Table 2 : Modified ANOVA Table for Simple Linear Regression Source of Variation SS df MS Regression SSR = (Ŷi Ȳ )2 1 MSR = SSR 1 Error SSE = (Y i Ŷi) 2 n 2 MSE = SSE n 2 Total SSTO = (Y i Ȳ )2 n 1 Correction for mean SS(correctionformean) = nȳ 2 1 Total, uncorrected SSTOU = Yi 2 n hsuhl (NUK) LR Chap 2 75 / 102

80 Analysis of Variance Approach to Regression Analysis Expected Mean Squares E{MSE} = σ 2 E{MSR} = σ 2 + β 2 1 MSE is an unbiased estimator of σ 2. Implication: (Xi X) 2 The mean of the sampling distribution of MSE is σ 2 ; The mean of the sampling distribution of MSR is σ 2 when β 1 = 0; When β 1 0, E{MSR} > E{MSE} = σ 2. ( β 2 1 (Xi X) 2 > 0) hsuhl (NUK) LR Chap 2 76 / 102

81 Analysis of Variance Approach to Regression Analysis F test of β 1 = 0 vs. β 1 0 The analysis of variance approach provides us with a battery highly useful tests for regression models. For the simple linear regression case, the ANOVA provides us with a test: H 0 :β 1 = 0 H a :β 1 0 Test Statistics: F = MSR MSE large values of F H a ; values of F near 1 H 0 ; hsuhl (NUK) LR Chap 2 77 / 102

82 Analysis of Variance Approach to Regression Analysis F test of β 1 = 0 vs. β 1 0 (cont.) Cochran s theorem If all n observations Y i come from the same normal distribution with mean µ and variance σ 2, and SSTO is decomposed into k sums of squares SS r, each with degrees of freedom df r, then the SS r /σ 2 terms are independent χ 2 variables with df r degrees of freedom if k df r = n 1 r=1 hsuhl (NUK) LR Chap 2 78 / 102

83 Analysis of Variance Approach to Regression Analysis F test of β 1 = 0 vs. β 1 0 (cont.) Property If β 1 = 0 so that all Y i have the same mean µ = β 0 and the same variance σ 2, SSE/σ 2 and SSR/σ 2 are independent χ 2 variables. When H 0 holds: F = SSR σ 2 SSE 1 σ 2 n 2 = MSR MSE χ2 (1) 1 χ2 (n 2) n 2 F(1, n 2) When H a holds, F follows the noncentral F distribution. hsuhl (NUK) LR Chap 2 79 / 102

84 Analysis of Variance Approach to Regression Analysis Construction of Decision Rule F F(1, n 2) The decision rule: α=type I error Decision IfF F(1 α; 1, n 2), concludeh 0 ; IfF > F(1 α; 1, n 2), concludeh a ; where F(1 α; 1, n 2) is the (1 α)100 percentile of the approximate F distribution. hsuhl (NUK) LR Chap 2 80 / 102

85 Analysis of Variance Approach to Regression Analysis Exmaple Toluca Company Earlier test on β 1 : (the t test, p46) Two-Sided Test H 0 : β 1 = β 10 If t = b 1 β 10 s{b 1 } t(1 α/2; n 2), conclude H 0 If t = b 1 β 10 s{b 1 } > t(1 α/2; n 2), conclude H a hsuhl (NUK) LR Chap 2 81 / 102

86 Analysis of Variance Approach to Regression Analysis Exmaple Using the F test H 0 :β 1 = β 10 = 0 H a :β 1 β 10 = 0 α = 0.05; n = 26; F(0.95; 1, 23) = 4.28 We have IfF 4.28, concludeh 0 F = MSR MSE What is the conclusion? = 252, 378 2, 384 = hsuhl (NUK) LR Chap 2 82 / 102

87 Analysis of Variance Approach to Regression Analysis Equivalence of F test and t Test For a given α level, the F test of H 0 :β 1 = 0 H a : β 1 0 is equivalence algebraically to the two-tailed t test. Decision F = b2 1 s 2 {b 1 } = ( ) 2 b1 = (t ) 2 s{b 1 } [t(1 α/2; n 2)] 2 = F(1 α; 1, n 2) t test: two-tailed; F test: one-tailed; hsuhl (NUK) LR Chap 2 83 / 102

88 General Linear Test Approach Steps for General Linear Test Approach 1 Fit the full model and SSE(F) 2 Fit the reduced model under H 0 and SSE(R) 3 Use test statistic and decision rule hsuhl (NUK) LR Chap 2 84 / 102

89 General Linear Test Approach 1. Full Model The full or unrestricted model: SSE(F): Y i = β 0 + β 1 X i + ε i SSE(F) = [ Y i (b 0 + b 1 X i )] 2 = (Yi Ŷi) 2 = SSE hsuhl (NUK) LR Chap 2 85 / 102

90 General Linear Test Approach 2. Reduced Model Hypothesis: H 0 :β 1 = 0 H a : β 1 0 The reduced or restricted model when H 0 holds: SSE(R): Y i = β 0 + ε i SSE(R) = [ Y i (b 0 )] 2 = (Yi Ȳ )2 = SSTO hsuhl (NUK) LR Chap 2 86 / 102

91 General Linear Test Approach 3. Test Statistic SSE(F) SSE(R) The more parameters are in the model, the better one can fit the data and the smaller are the deviations around the fitted regression function. hsuhl (NUK) LR Chap 2 87 / 102

92 General Linear Test Approach 3. Test Statistic (cont.) When SSE(F) is not much less than SSE(R), using the full model does not account for much more of the variability of the Y i than does the reduced model. i.e., H 0 holds. Suggest that the reduced model is adequate When SSE(F) is close to SSE(R), the variation of the observations around the fitted regression function for the full model is almost as great as the variation around the fitted regression function for the reduced model. hsuhl (NUK) LR Chap 2 88 / 102

93 General Linear Test Approach 3. Test Statistic (cont.) A small difference SSE(R) SSE(F) suggests that H 0 holds. A large difference suggests that H a holds. Test Statistic: a function of SSE(R) SSE(F): F = SSE(R) SSE(F) df R df F SSE(F) df F Fdistribution when H 0 holds. Decision rule: If F F(1 α; df R df F, df F ), conclude H 0 If F > F(1 α; df R df F, df F ), conclude H a hsuhl (NUK) LR Chap 2 89 / 102

94 General Linear Test Approach 3. Test Statistic (cont.) For testing whether or not β 1 = 0, we have SSE(R) = SSTO SSE(F) = SSE df R = n 1 df F = n 2 F = MSR MSE hsuhl (NUK) LR Chap 2 90 / 102

95 Descriptive Measures of Linear Association between X and Y Reason The usefulness of estimates or predictions depends upon the width of the interval The user s needs for precision which vary from one application to another. No single descriptive measure of the degree of linear association can capture the essential information as to whether a given regression relation is useful in any particular application. hsuhl (NUK) LR Chap 2 91 / 102

96 Descriptive Measures of Linear Association between X and Y Coefficient of Determination SSTO: the variation in the observation Y i, or a measure of the uncertainty in predicting Y when X is not considered. SSE measures the variation in the Y i when a regression model utilizing the predictor variable X is employed. A natural measure of the effect of X in reducing the variation in Y, i.e., in reducing the uncertainty in predicting Y, is to express the reduction in variation (SSTO SSE = SSR) as a proportion of the total variation: R 2 = SSR SSTO = 1 SSE SSTO Interpret: R 2 -the proportionate reduction of total variation associated with the use of the predictor variable X hsuhl (NUK) LR Chap 2 92 / 102

97 Descriptive Measures of Linear Association between X and Y Coefficient of Determination (cont.) Montgomery: Introduction to linear regression analysis R 2 : called the proportion of variation explained by the regressor x Wiki R 2 is a statistic that will give some information about the goodness of fit of a model. In regression, the R 2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. hsuhl (NUK) LR Chap 2 93 / 102

98 Descriptive Measures of Linear Association between X and Y Coefficient of Determination (cont.) R 2 : the coefficient of determination; 判定係數 ; 決定係數 0 SSE SSTO 0 R 2 = SSR SSTO = 1 SSE SSTO 1 SSE = 0 R 2 = 1 SSR = 0 R 2 = 0 關係 )) (If all Y i = Ŷi) (If all Ŷi = Ȳ (b 1 = 0 X 與 Y 無直線 hsuhl (NUK) LR Chap 2 94 / 102

99 Descriptive Measures of Linear Association between X and Y Coefficient of Determination (cont.) R 2 = 0 X 與 Y 無線性關係 There is no linear association between X and Y in the sample data, and the predictor variable X is of no help in reducing the variation in Y i with linear regression. R 2 1 X 軸與 Y 軸變項間的直線關係越強 ( 具有線性變化 ) The closer it is to 1, the greater is said to be the degree of linear association between X and Y. hsuhl (NUK) LR Chap 2 95 / 102

100 Descriptive Measures of Linear Association between X and Y Coefficient of Determination (cont.) The Toluca Company: fitreg<-lm(hrs~size,data=toluca) anova(fitreg) summary(fitreg) hsuhl (NUK) LR Chap 2 96 / 102

101 Descriptive Measures of Linear Association between X and Y Coefficient of Determination (cont.) The variation in work hours is reduced by 82.2% when lot size is considered. hsuhl (NUK) LR Chap 2 97 / 102

102 Descriptive Measures of Linear Association between X and Y Coefficient of Determination (cont.) hsuhl (NUK) LR Chap 2 98 / 102

103 Descriptive Measures of Linear Association between X and Y Coefficient of Determination (cont.) R 2 is widely used for describing the usefulness of a regression model. Serious misunderstanding: A high R 2 indicates that useful predictions can be made. (Not necessarily correct. Ex: X h = 100) A high R 2 indicated that the estimated regression line is a good fit. (Not necessarily correct. Curvilinear) A R 2 near 0 indicated that X and Y are not related. (Not necessarily correct. Curvilinear) hsuhl (NUK) LR Chap 2 99 / 102

104 Descriptive Measures of Linear Association between X and Y Coefficient of Determination (cont.) hsuhl (NUK) LR Chap / 102

105 Descriptive Measures of Linear Association between X and Y Coefficient of Correlation ( 相關係數 ) A measure of linear association between Y and X when Y and X are random is the coefficient of correlation. r = ± R 2 A plus or minus sign is attached to this measure according to whether the slope of the fitted regression line is positive of negative. 1 r 1 The wider the X i are spaced, the higher R 2 will tend to be. R 2 = SSR SSTO = b2 1 (Xi X) 2 (Yi Ȳ )2 hsuhl (NUK) LR Chap / 102

106 Descriptive Measures of Linear Association between X and Y Coefficient of Correlation ( 相關係數 ) (cont.) SSR: the expected variation in Y SSE: the unexplained variation R 2 is interpreted in terms of the proportion of the total variable in Y which has been explained by X. 易被 Misunderstand: Y does not necessarily depends on X in a causal ( 因果 ) or explanatory ( 解釋 ) sense hsuhl (NUK) LR Chap / 102

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor