STOR 664 Homework 2 Solution Part A Exercise (Faraway book) Ch2 Ex1 > data(teengamb) > attach(teengamb) > tgl<-lm(gamble ~ sex+status+income+verbal) > summary(tgl) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) 2255565 1719680 1312 01968 sex -2211833 821111-2694 00101 * status 005223 028111 0186 08535 income 496198 102539 4839 179e-05 *** verbal -295949 217215-1362 01803 Signif codes: 0 *** 0001 ** 001 * 005 01 1 Residual standard error: 2269 on 42 degrees of freedom Multiple R-squared: 05267, Adjusted R-squared: 04816 F-statistic: 1169 on 4 and 42 DF, p-value: 1815e-06 (a) The percentage of variation in the response is given by the Multiple R-squared, which is 5267% (b) The th case has the largest residual > whichmax(tgl$residuals) (c) The mean of the residuals is almost 0( 86 10 17 ) and the median is 1451 > mean(tgl$residuals) > median(tgl$residuals) (d) The correlation of the residuals with fitted values is 2586 10 17 > cor(tgl$residuals,tgl$fittedvalues) (e) The correlation of the residuals with the income is 5027 10 17 > cor(tgl$residuals,income) (f) Based on the summary, the fitted model can be explicitly written as: gamble = 2255565 2211833 sex + 005223 status + 496198 income 295949 verbal If all the predictors except sex are held constant, the difference in predicted expenditure on gambling between male (sex=0) and female (sex=1) will be equal to the regression coefficient of sex, ie, 2211833 Therefore whenever sex changes from male (sex=0) to female (sex=1), the value of gamble decreases by 2211833 In other words, according to the current regression model, a female spends $2211833 less than a comparable (ie, other predictors being held constant) male on gambling 1
Part B Ch3 Ex2 The model is Y = Xβ + ɛ where x 11 x 12 Y = (y 1, y 2,, y n ) x 21 x 22, X =, β = (β 1, β 2 ), ɛ = (ɛ 1, ɛ 2,, ɛ n ) Using those calculations, (X X) 1 = x n1 x n2 ( 1 x 2 i2 ) x i1 x i2 x 2 2 i1 i2 ( x i1 x i2 ) 2 x i1 x i2 x 2, X Y = i1 ( ) xi1 y i xi2 y i Least Squared Estimation of β is given by Normal equation; ( ˆβ = (X X) 1 X 1 x 2 Y = i2 xi1 y i ) x i1 x i2 xi2 y i x 2 2 i1 i2 ( x i1 x i2 ) 2 x i1 x i2 xi1 y i + x 2 i1 xi2 y i Clearly estimates are unbiased and Cov( ˆβ) = σ 2 (X X) 1 : Var( ˆβ 1 ) = σ 2 x 2 i2 /A, Cov( ˆβ 1, ˆβ 2 ) = σ 2 x i1 x i2 /A where A = x 2 2 i1 i2 ( x i1 x i2 ) 2 Var( ˆβ 2 ) = σ 2 x 2 i1 /A, Ch3 Ex4 (a) Write considered model as Y = Xβ +ɛ, the statement that ˆθ = c ˆβ is the BLUE for θ = c β is equivalent to say that For any linear unbiased θ = b Y, Var(ˆθ) Var( θ) Now, unbiasedness of θ gives that Eb Y = b Xβ = c β X b = c and Var(c ˆβ) = σ 2 c (X X) 1 c = σ 2 c (X X) 1 X X(X X) 1 c, Var(b Y ) = σ 2 b b Thus it is also equivalent to c (X X) 1 X X(X X) 1 c b b b st X b = c X(X X) 1 c = argmin b b b subject to X b = c This proves the argument (b) Let P = {Xβ R n β R p } be as in the book Define more sets; P c = {b R n X b = c} and P 0 = {b R n X b = 0} First, note that for any Xβ P and for any b P 0, the inner product is 0 (ie (Xβ) b = β X b = 0) So P 0 P The problem in (a) is equivalent to find a vector a P c st a a is minimized With some geometric understanding, it is enough to find a vector a which is orthogonal to b a for all b P c Now, since P 0 P, a is also in P ie a P P c a = Xβ for some β R p a P c = X a = X Xβ = β = (X X) 1 c = a = Xβ = X(X X) 1 c a P c (Optional) Alternatively, it is easily proved by the fact that I H = I X(X X) 1 X is symmetric, idempotent so nnd so that Var( θ) Var(ˆθ) 0 Or, use Cov(ˆθ, θ ˆθ) = 0, so that Var( θ) Var(ˆθ) Ch3 Ex6 (a) Follow scheme (ii), set 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 X = 1 0 0 1, β = 1 0 0 1 0 0 1 1 0 0 1 1 β 1 β 4, ɛ = ɛ 1 ɛ 12, Y = y 1 y 12 2
5 1 1 ˆβ = (X X) 1 X Y = 1 5 1 1 1 5 1 1 1 1 1 1 5 X Y (b) Write z i for the weighings of scheme (i), while y i is for scheme (ii) Suppose sd(z i ) = σ, then sd(y i ) = 2σ Then, Var( ˆβ (i) ) = 1 3 σ2 and Var( ˆβ (ii) ) = 5 22 σ 2 = 5 6 σ2 Scheme (i) is superior in this case (c) scheme (i) : weigh each item 10 times 1 10 y 1 1 10 0 Y = X = 1 10 1 10 y 12 0 1 10 1 10 = I 6 1 10, β = β 1 β 6, X X = (I 6 1 10)(I 6 1 10 ) = I 6 1 101 10 = 10I 6 ( 10 ) ˆβ = (X X) 1 X Y = 10 1 I 6 X Y = 1 60 y i,, y i and 10 i=1 i=51 Var( ˆβ 1 ) = 1 10 σ2 scheme(ii) : weigh each pair 4 times 1 4 1 4 0 0 0 0 1 4 0 1 4 0 0 0 X = 1 4 0 0 1 4 0 0, 0 0 0 0 1 4 1 4 scheme(iii) : weigh each triple 3 times 1 3 1 3 1 3 0 0 0 1 3 1 3 0 1 3 0 0 1 3 1 3 0 0 1 3 0 X = 1 3 1 3 0 0 0 1 3, 0 0 0 1 3 1 3 1 3 X X = 16I 6 + 4J 6, (X X) 1 = 1 16 I 6 4 16(16+4 6) J 6, ˆβ = (X X) 1 X Y, Var( ˆβ 1 ) = 9 160 σ2 X X = 18I 6 + 12J 6, (X X) 1 = 1 18 I 6 12 18(18+12 6) J 6, ˆβ = (X X) 1 X Y, Var( ˆβ 1 ) = 13 270 σ2 Since Var(iii) < Var(ii) < Var(i), Scheme (iii) is the best Ch3 Ex18 In general, the question what is the best functional form of the relationship should be answered carefully, and there is no correct (or wrong) answer In this problem, strength of yarn would grow in ratio of fiber length, tensile strength of the fibers, and fiber fineness, their relationship would be a multiple form so X 1 = ax2x b 3X c 4 d Fit this through simple linear regression method; the model we regress is log x i1 = β 1 + β 2 log x i2 + β 3 log x i3 + β 4 log x i4 + ɛ i Since ˆβ 3 is not significant(009126(039551)), after omitting log x 3, fitted result is; lm(formula = lx1 ~ lx2 + lx4) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) 4 07779 3117 000628 ** lx2 08355 01531 5458 425e-05 *** 3
lx4-04040 01098-3681 000185 ** Signif codes: 0 *** 0001 ** 001 * 005 01 1 Residual standard error: 006741 on 17 degrees of freedom Multiple R-Squared: 07175, Adjusted R-squared: 06842 F-statistic: 2158 on 2 and 17 DF, p-value: 2159e-05 We have s 2 = 0004544 For (a)-(c), with confidence level α = 005, number of points K = 4 and n = 3, confidence intervals are of the form [ŷ ± t se(ŷ)] where t = t 1 α/2,n 3 for (a), t 1 α/2k,n 3 for (b-bonferroni), and 3F 1 α,3,n 3 for (c-scheffe) num ŷ CI for each sci(bonferroni) sci(scheffe) 1 8946327 8600056 9306539 8490900 9426181 8442698 9479998 2 9441991 9061474 9838488 8941654 9970325 8888764 10029651 3 9708884 9373159 10056634 9267026 10171811 9220111 10223568 4 8325013 7822336 8859993 7666263 9040369 7597709 9121939 If you want to fit the model without logarithm transformation, the following should be the answer lm(formula = fiber$x1 ~ X2 + X4) Residuals: Min 1Q Median 3Q Max -9843-3690 -1934 6608 10862 Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) 954000 14333 66557 < 2e-16 *** X2 10706 01839 5823 204e-05 *** X4-10157 02692-3773 000152 ** Signif codes: 0 *** 0001 ** 001 * 005 01 1 Residual standard error: 641 on 17 degrees of freedom Multiple R-Squared: 07453, Adjusted R-squared: 07154 F-statistic: 88 on 2 and 17 DF, p-value: 8931e-06 and confidence intervals are given: num ŷ CI for each sci(bonferroni) sci(scheffe) 1 8939622 8562869 9316375 8440953 9438291 8386616 9492627 2 9474931 9085545 9864318 8959541 9990322 8903382 10046481 3 9779652 9452995 10106310 9347290 10212014 9300179 10259126 4 8376871 7813549 8940194 7631260 91282 7550016 9203727 (d) Each interval of Bonferroni is narrower than corresponding interval of Scheffe method Thus, Bonferroni method is preferred Ch3 Ex19 (a) Call: lm(formula = protein ~ L1 + L2 + L3 + L4 + L5 + L6) Residuals: Min 1Q Median 3Q Max -0397979-0126604 -00082 0077451 0387052 4
Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) 23074230 9899022 2331 003232 * L1 00281 0082118 0342 073618 L2 0001667 0087162 0019 098497 L3 0234909 0077400 3035 000748 ** L4-00445 0063218-3803 000142 ** L5 0011839 0006126 1932 007014 L6-0035584 0045530-0782 044522 Signif codes: 0 *** 0001 ** 001 * 005 01 1 Residual standard error: 02203 on 17 degrees of freedom Multiple R-Squared: 09821, Adjusted R-squared: 09758 F-statistic: 1559 on 6 and 17 DF, p-value: 6654e-14 Residual sum of square is 17s 2 = 08250455 (b)(l1, L2, L5, L6) regression coefficients of which are smaller in magnitude than twice their standard errors respectively Thus we shall keep L3 and L4 only lm(formula = protein ~ L3 + L4) Residuals: Min 1Q Median 3Q Max -052215-009417 002566 013763 041405 Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) 31174314 1308664 2382 < 2e-16 *** L3 00405 0009640 94 < 2e-16 *** L4-0217101 0009568-2269 297e-16 *** Signif codes: 0 *** 0001 ** 001 * 005 01 1 Residual standard error: 077 on 21 degrees of freedom Multiple R-Squared: 09721, Adjusted R-squared: 09695 F-statistic: 3662 on 2 and 21 DF, p-value: < 22e-16 Residual sum of square is 17s 2 = 1288461 (c) Set H 0 : model in (b) vs H 1 : model in (a) Model 1: protein ~ L3 + L4 Model 2: protein ~ L1 + L2 + L3 + L4 + L5 + L6 ResDf RSS Df Sum of Sq F Pr(>F) 1 21 128884 2 17 082534 4 046350 23868 00918 P-value is 00918; not quite small enough to reject H 0 Model in (b) is good (d) With confidence level α = 010, number of points K = 6, p = 3 and n =, prediction intervals are of the form [ŷ ± t se(ŷ)] where t = t 1 α/2,n 3 for (i), t 1 α/2k,n 3 for (ii-bonferroni), and 6F 1 α,6,n 3 for (iii-scheffe) 5
num ŷ PI for each spi(bonferroni) spi(scheffe) 1 9801225 9365855 10236595 9143050 1045940 8807162 10795288 2 8358796 7909020 8808573 7678842 903875 7331840 9385753 3 9444302 9007920 9880684 8784598 1010401 8447929 10440675 4 11831218 11341547 12320888 11090953 1257148 10713172 12949263 5 10025695 9589644 10461745 9366491 1068490 9030079 11021311 6 10986116 10520573 11451660 10282327 1168991 9923160 12049072 Bonferroni is more preferable Part C Ch3 Ex7 (a) Given level α test in ANOVA model is to reject H 0 if F F I 1,n I,1 α Power function is given by β(δ) = P δ (test rejects H 0 ) = P(F F I 1,n I,1 α ) where F non-central F I 1,n I,δ To get δ, use substitution rule, δ 2 σ 2 = n i (y i ȳ) 2 yi=θ i,ȳ= θ = n i (θ i θ) 2 Now by the following claim, conclude that the power is maximized when δ is the greatest possible ie ni (θ i θ) 2 subject to n i = n is maximized Claim: Let F δ F m,n,δ for some m, n, then P(F δ1 a) P(F δ2 a) if δ 1 δ 2 0 (Heuristic argument: observe that mass of F m,n,δ tends to move toward right when δ increases) proof: Let Z N(0, 1), X χ 2 m 1, Y χ 2 n and independent with each other Write their densities as f Z, f X and f Y resp then if P((Z + δ) 2 a) decreases as δ decreases, P(F δ1 a) = P( (Z + δ 1) 2 + X a n Y m ) = P((Z + δ 1) 2 + X a Y ) = 0 0 0 0 = P(F δ2 a) P((Z + δ 1 ) 2 a y x)f X (x)f Y (y)dxdy P((Z + δ 2 ) 2 a y x)f X (x)f Y (y)dxdy where a = a n m Thus it is left to prove that P((Z + δ) 2 a) decreases as δ decreases Equivalently, P((Z + δ) 2 a) increases as δ decreases, since {(Z + δ) 2 a} = { a δ Z a δ} tends to cover more neighborhood of 0 (where most mass of N(0, 1) is on) when δ decreases One can prove this argument more mathematically (b) When I = 2, n i = na i, a 1 + a 2 = 1, write f(a 1 ) = n i (θ i θ) 2 = a 1 (θ 1 (a 1 θ 1 + (1 a 1 )θ 2 )) 2 + (1 a 1 )(θ 2 (a 1 θ 1 + (1 a 1 )θ 2 )) 2 The fact that f(a 1 ) is concave with peak at a 1 = 1/2 proves the argument (c) Write v = θ 3 θ 2 = θ 2 θ 1 Let B j = a i (θ i θ) 2 where j indicates allocation scheme (i) and (ii) Since δ = n B j, scheme (j) with larger B j gives more powerful test (j = i, ii) Calculation gives B 1 = 2/3v 2, B 2 = v 2 Thus, scheme (ii) gives more power and if we have 3/2n instead of n for total number of samples, then the power would be approximately the same (note that the second df of the F distributions would be different then due to the different sample sizes) 6
Appendix: Sample code for Ex18 and Ex19 #### #318 fiber <- readtable("d:/2006-fall/stat664/hw2/fiber2dat") names(fiber) <- c("no","x1","x2","x3","x4") lfiber <- log(fiber) names(lfiber) <- c("no","lx1","lx2","lx3","lx4") X2 <- fiber$x2 - mean(fiber$x2) X3 <- fiber$x3 - mean(fiber$x3) X4 <- fiber$x4 - mean(fiber$x4) rm <- lm(fiber$x1 ~ X2+X4) logrm <- lm(lx1 ~ lx2+lx4, data=lfiber) summary(rm) summary(logrm) #### confidence interval for mean X1 xc <- t(matrix(ncol=4, nrow=3, c(75- mean(fiber$x2),70- mean(fiber$x3),45 - mean(fiber$x4), 80- mean(fiber$x2),70- mean(fiber$x3),45 - mean(fiber$x4), 80- mean(fiber$x2),75- mean(fiber$x3),42 - mean(fiber$x4), 65- mean(fiber$x2),80- mean(fiber$x3),40 - mean(fiber$x4)))) xc<-dataframe(xc[,c(1,3)]) names(xc) <- c("x2","x4") a <- 005 ## 1-a prediction intervals K <- 4 ## number of points p <- predict(rm, xc,sefit=t) ## prediction PI_each <- cbind(p$fit - qt(1-a/2,rm$dfresidual)*p$sefit, p$fit + qt(1-a/2,rm$dfresidual)*p$sefit) PI_Bonf <- cbind(p$fit - qt(1-a/(2*k),rm$dfresidual)*p$sefit, p$fit + qt(1-a/(2*k),rm$dfresidual)*p$sefit) PI_Sche <- cbind(p$fit - sqrt(3*qf(1-005,3,rm$dfresidual))*p$sefit, p$fit + sqrt(3*qf(1-005,3,rm$dfresidual))*p$sefit) #### confidence interval for logx1 xc <- t(matrix(ncol=4, nrow=3, c(75,70,45, 80,70,45, 80,75,42, 65,80,40))) logxc <- log(xc) xc<-dataframe(logxc[,c(1,3)]) names(xc) <- c("x2","x4") a <- 005 ## 1-a prediction intervals K <- 4 ## number of points p <- predict(logrm, xc,sefit=t) ## prediction PI_each <- cbind(p$fit - qt(1-a/2,logrm$dfresidual)*p$sefit, p$fit + qt(1-a/2,logrm$dfresidual)*p$sefit) PI_Bonf <- cbind(p$fit - qt(1-a/(2*k),logrm$dfresidual)*p$sefit, p$fit + qt(1-a/(2*k),logrm$dfresidual)*p$sefit) PI_Sche <- cbind(p$fit - sqrt(3*qf(1-005,3,logrm$dfresidual))*p$sefit, p$fit + sqrt(3*qf(1-005,3,logrm$dfresidual))*p$sefit) 7
#### #319 gw <- readtable("d:/2007-fall/664 - solution/hw2/proteindat") names(gw) <- c("no","protein","l1","l2","l3","l4","l5","l6") attach(gw) #par(mfrow=c(2,3)) #### (a) Fit linear model with all covariates fm <- lm(protein~l1+l2+l3+l4+l5+l6) summary(fm) ####(b) ## (L1, L2, L5, L6) regression coefficients of which are smaller in magnitude ## than twice their standard errors resp rm <- lm(protein~l3+l4) summary(rm) ####(c) anova(rm,fm) ## H0 : rm ## H1 : fm ## p-value is 00918; not quite small enough to reject H0(rm) ## model in (b) (rm) is good #### confidence interval for rm in (b) a <- 010 ## 1-a prediction intervals K <- 6 ## number of points pt <- readtable("d:/2007-fall/664 - solution/hw2/reflectdat") pt_rm<-pt[,4:5] names(pt_rm)<-c("l3","l4") ####(d)-i p <- predict(rm, dataframe(pt_rm),sefit=t) ## prediction sep <- p$residualscale * sqrt(1+(p$sefit/p$residualscale)^2) ##se for PI p$fit PI_each <- cbind(p$fit - qt(1-a/2,rm$dfresidual)*sep, p$fit + qt(1-a/2,rm$dfresidual)*sep) PI_Bonf <- cbind(p$fit - qt(1-a/(2*k),rm$dfresidual)*sep, p$fit + qt(1-a/(2*k),rm$dfresidual)*sep) PI_Sche <- cbind(p$fit - sqrt(6*qf(1-005,6,rm$dfresidual))*sep, p$fit + sqrt(6*qf(1-005,6,rm$dfresidual))*sep) 8