Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

Size: px

Start display at page:

Download "Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005"

Claud Anderson
5 years ago
Views:

1 Statistics 203 Itroductio to Regressio ad Aalysis of Variace Assigmet #1 Solutios Jauary 20, 2005 Q. 1) (MP 2.7) (a) Let x deote the hydrocarbo percetage, ad let y deote the oxyge purity. The simple liear regressio model is ŷ = x. > #MP 2.7, Oxyge > oxyge.table <- read.table(" stats203/data/oxyge.table", header=t, sep=",") > attach(oxyge.table) > purity.lm <- lm(purity ~ hydro) > summary(purity.lm) Call: lm(formula = purity ~ hydro) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) e-13 *** hydro ** --- Sigif. codes: 0 *** ** 0.01 * Residual stadard error: o 18 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: o 1 ad 18 DF, p-value:

2 > #Use filled i circles i the plot by typig pch=21 > plot(hydro, purity, pch=21, bg= blue, mai="purity vs Hydrocarbo Percetage") > ablie(purity.lm$coef, lwd=2) Figure 1: Plot of the purity versus hydrocarbo percetage, with the least squares lie superimposed. Figure 1 suggests a positive relatioship betwee oxyge purity ad the hydrocarbo percetage. (b) Cosider H 0 : β 1 = 0 versus H 1 : β 1 0. We have t ˆβ1 = o 20 2 = 18 d.f., correspodig to a p-value 2

3 of We therefore reject H 0 i favor of H 1, ad coclude that the true slope β 1 is ot zero. (c) From summary(purity.lm) i part (a) above, we have R 2 = (d) A 95% cofidece iterval for β 1 i this SLR model is give i R by: > cofit(purity.lm, level=.95) 2.5 % 97.5 % (Itercept) hydro Alteratively, recall that a 100(1 α)% CI for β 1 is: ˆβ 1 ± SE ˆβ1 t 1 α/2, 2. From above, we have SE ˆβ1 = We ca ow compute this i R by typig: > t.quatiles <- qt(c(.025,.975), 18) > *t.quatiles [1] Here, t.quatiles are the.025 ad.975 quatiles of the t 18 distributio. (e) A 95% cofidece iterval for E(Y X = 1.0) is give by (87.51, 91.82). This is computed i R as follows. > predict(purity.lm, ewdata=list(hydro = 1.0), iterval="cofidece", level=.95) fit lwr upr [1,] Q. 2) (MP 2.19) (a) As usual, let SSE = [y i (β 0 + β 1 x i )] 2. The: SSE β 1 = 2 (y i β 0 β 1 x i ) 1 ( x i ) Settig SSE β 1 ˆβ1 = 0 ad dividig the above by 2, we obtai: ˆβ 1 0 = x 2 i = (x i y i β 0 x i ˆβ 1 x 2 i ) x i y i β 0 x i 3

4 So the least squares estimate ˆβ 1 is give by: ˆβ 1 = x iy i β 0 x i x2 i = (y i β 0 )x i. x2 i (b) We derive Var( ˆβ 1 ) as follows: ( Var( ˆβ 1 ) = Var x iy i β 0 x ) i x2 i = ( x 2 i ) 2 Var( x i y i β 0 x i ) = ( x 2 i ) 2 Var( x i y i ) = ( = ( x 2 i ) 2 x 2 i ) 2 / = σ 2 x 2 i. x 2 i Var(y i ) x 2 i σ 2 sice β 0 x i is costat sice {y i } are idepedet (c) We summarize our results i the followig table. Model Var( ˆβ 1 ) SSE SE( ˆβ 1 ) = Var( ˆβ 1 ) σ β 0 ukow P 2 (xi x)2 (y i ˆβ 0 ˆβ 1 x i ) 2 P SSE/( 2) (xi x)2 (y i β 0 ˆβ 1 x i ) 2 β 0 kow σ 2 P x2 i SSE/( 1) P x2 i β0 kow β0 ukow First, otice that Var( ˆβ 1 ) Var( ˆβ 1 ). (Equality holds if ad oly if x = 0.) To see this, observe: = (x i x) 2 = σ 2 (x i x) 2 x 2 i x 2 x 2 i σ 2 x2 i 4

5 Hece, cofidece itervals for β 1 will be arrower whe β 0 is kow, regardless of sample size (uless x = 0). Furthermore, i derivig a cofidece iterval for β 1 whe β 0 is kow, it is ot hard to show that: T = ˆβ 1 β 1 SE( ˆβ 1 ) t 1 The estimator ˆβ 1 is a liear combiatio of the {y i }, ad hece is ormally distributed. ˆβ1 is also ubiased: ( E( ˆβ 1 ) = E x iy i β 0 x ) i x2 i ( = ( x 2 i ) 1 E(x i y i ) β 0 = ( = ( x i ) ( x 2 i ) 1 E[x i (β 0 + β 1 x i + ε)] β 0 ( x 2 i ) 1 = β 1. β 0 x i + β 1 x 2 i β 0 xi ) x i ) Agai, the vector of residuals e is idepedet of ˆβ 1, ad so Var( ˆβ 1 ) is idepedet of ˆβ 1. Hece, T = ( )/ ˆβ 1 β 1 σ2 / (y i β 0 ˆβ 1 x i ) 2 /σ 2 x2 1 i }{{}}{{} N(0,1) χ 2 1 /( 1) = ˆβ 1 β 1 SE( ˆβ 1 ) t 1. Therefore, a 100(1 α)% cofidece iterval for β 1 has the form: ˆβ 1 ± t 1 α/2,ν SE( ˆβ 1 ), where the degrees of freedom ν = 1 whe β 0 is kow (oe less parameter to estimate) ad where ν = 2 whe β 0 is ukow. This differece of 1 degree of freedom results i a slightly arrower 5

6 CI for β 1, but for relatively large samples, this differece is almost egligible. The major differece is attributed to the variace of the estimator ˆβ 1. Q. 3) (MP 3.10) (a) The followig R commads allow us to compute ad plot the residuals ad stadardized residuals. > softdrik.table <- read.table(" data/softdrik.table", header=t, sep=" ") > attach(softdrik.table) > #Compute residuals > softdrik.lm <- lm(y ~ x1 + x2) > softdrik.resid <- softdrik.lm$residuals > #Compute stadardized residuals > softdrik.st.resid <- rstadard(softdrik.lm) > #Combie residuals & stadardized residuals usig cbid > prit(cbid(softdrik.resid, softdrik.st.resid)) softdrik.resid softdrik.st.resid

7 > #Plot the residuals & stadard residuals i oe widow > par(mfrow = c(1,2)) > plot(softdrik.lm$residuals, pch=23, bg= blue, cex=2, lwd=2, mai="residuals") > plot(rstadard(softdrik.lm), pch=23, bg= red, cex=2, lwd=2, mai="stadardized Residuals") Figure 2: Plots of the residuals (blue) ad stadardized residuals (red) for the soft drik data. (b) From Table 4.2 of Motgomery ad Peck, we otice that the x 1 ad 7

8 Figure 3: Plot of the residuals versus fits, suggestig that case umber 9 is a outlyig observatio. x 2 values for Observatio 9 are much higher tha what appears to be typical, suggestig that Observatio 9 is a uusual observatio. We ca use the plot commad i R to obtai the residuals versus fits ad a Cook s distace plot. Both plots suggest that case umber 9 is a outlyig observatio. Refer to Figures 3 ad 4. > plot(softdrik.lm) 8

9 Figure 4: The Cooks distace plot suggests that Observatio 9 is a outlier. 9

10 Q. 4) (MP 4.24) Method 1: First ote that: For a costat matrix A ad a radom vector Z, we have Var(AZ) = A Var(Z)A t. The hat matrix H is oly a fuctio of X, which we treat as fixed. Hece, H is a costat matrix. Uder the multiple regressio model, we assume Var(Y ) = σ 2 I is a costat matrix. For a symmetric matrix U, that is, U = U t, we have U 1 = (U 1 ) t. To see this, observe: U 1 U = I (U 1 ) t U t = I Sice U = U t, we must have U 1 = (U 1 ) t. With Ŷ = HY, we therefore have: Var(Ŷ ) = Var(HY ) = H Var(Y )H t = [X(X t X) 1 X t ](σ 2 I)[X(X t X) 1 X t ] t = σ 2 [X(X t X) 1 X t ][(X t ) t ((X t X) 1 ) t X t ] = σ 2 X(X t X) 1 (X t X)((X t X) 1 ) t X t = σ 2 XI((X t X) 1 ) t X t = σ 2 X(X t X) 1 X t = σ 2 H The fact that ((X t X) 1 ) t = (X t X) 1 follows, sice X t X is a symmetric matrix. Method 2: Alteratively, otice that H = H t (provide a proof) ad use the result of the ext problem (that H = H 2 ) to see: Var(Ŷ ) = Var(HY ) = H Var(Y )H t = σ 2 HH t = σ 2 HH = σ 2 H 10

11 Q. 5) (MP 4.25) H 2 = [X(X t X) 1 X t ][X(X t X) 1 X t ] = X(X t X) 1 (X t X)(X t X) 1 X t = XI(X t X) 1 X t = H. (I H) 2 = I 2 2IH + H 2 = I 2H + H = I H. 11

Simple Linear Regression

Simple Linear Regression Simple Liear Regressio 1. Model ad Parameter Estimatio (a) Suppose our data cosist of a collectio of pairs (x i, y i ), where x i is a observed value of variable X ad y i is the correspodig observatio