(a) The percentage of variation in the response is given by the Multiple R-squared, which is 52.67%.

Size: px

Start display at page:

Download "(a) The percentage of variation in the response is given by the Multiple R-squared, which is 52.67%."

Erik McBride
6 years ago
Views:

1 STOR 664 Homework 2 Solution Part A Exercise (Faraway book) Ch2 Ex1 > data(teengamb) > attach(teengamb) > tgl<-lm(gamble ~ sex+status+income+verbal) > summary(tgl) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) sex * status income e-05 *** verbal Signif codes: 0 *** 0001 ** 001 * Residual standard error: 2269 on 42 degrees of freedom Multiple R-squared: 05267, Adjusted R-squared: F-statistic: 1169 on 4 and 42 DF, p-value: 1815e-06 (a) The percentage of variation in the response is given by the Multiple R-squared, which is 5267% (b) The th case has the largest residual > whichmax(tgl$residuals) (c) The mean of the residuals is almost 0( ) and the median is 1451 > mean(tgl$residuals) > median(tgl$residuals) (d) The correlation of the residuals with fitted values is > cor(tgl$residuals,tgl$fittedvalues) (e) The correlation of the residuals with the income is > cor(tgl$residuals,income) (f) Based on the summary, the fitted model can be explicitly written as: gamble = sex status income verbal If all the predictors except sex are held constant, the difference in predicted expenditure on gambling between male (sex=0) and female (sex=1) will be equal to the regression coefficient of sex, ie, Therefore whenever sex changes from male (sex=0) to female (sex=1), the value of gamble decreases by In other words, according to the current regression model, a female spends $ less than a comparable (ie, other predictors being held constant) male on gambling 1

2 Part B Ch3 Ex2 The model is Y = Xβ + ɛ where x 11 x 12 Y = (y 1, y 2,, y n ) x 21 x 22, X =, β = (β 1, β 2 ), ɛ = (ɛ 1, ɛ 2,, ɛ n ) Using those calculations, (X X) 1 = x n1 x n2 ( 1 x 2 i2 ) x i1 x i2 x 2 2 i1 i2 ( x i1 x i2 ) 2 x i1 x i2 x 2, X Y = i1 ( ) xi1 y i xi2 y i Least Squared Estimation of β is given by Normal equation; ( ˆβ = (X X) 1 X 1 x 2 Y = i2 xi1 y i ) x i1 x i2 xi2 y i x 2 2 i1 i2 ( x i1 x i2 ) 2 x i1 x i2 xi1 y i + x 2 i1 xi2 y i Clearly estimates are unbiased and Cov( ˆβ) = σ 2 (X X) 1 : Var( ˆβ 1 ) = σ 2 x 2 i2 /A, Cov( ˆβ 1, ˆβ 2 ) = σ 2 x i1 x i2 /A where A = x 2 2 i1 i2 ( x i1 x i2 ) 2 Var( ˆβ 2 ) = σ 2 x 2 i1 /A, Ch3 Ex4 (a) Write considered model as Y = Xβ +ɛ, the statement that ˆθ = c ˆβ is the BLUE for θ = c β is equivalent to say that For any linear unbiased θ = b Y, Var(ˆθ) Var( θ) Now, unbiasedness of θ gives that Eb Y = b Xβ = c β X b = c and Var(c ˆβ) = σ 2 c (X X) 1 c = σ 2 c (X X) 1 X X(X X) 1 c, Var(b Y ) = σ 2 b b Thus it is also equivalent to c (X X) 1 X X(X X) 1 c b b b st X b = c X(X X) 1 c = argmin b b b subject to X b = c This proves the argument (b) Let P = {Xβ R n β R p } be as in the book Define more sets; P c = {b R n X b = c} and P 0 = {b R n X b = 0} First, note that for any Xβ P and for any b P 0, the inner product is 0 (ie (Xβ) b = β X b = 0) So P 0 P The problem in (a) is equivalent to find a vector a P c st a a is minimized With some geometric understanding, it is enough to find a vector a which is orthogonal to b a for all b P c Now, since P 0 P, a is also in P ie a P P c a = Xβ for some β R p a P c = X a = X Xβ = β = (X X) 1 c = a = Xβ = X(X X) 1 c a P c (Optional) Alternatively, it is easily proved by the fact that I H = I X(X X) 1 X is symmetric, idempotent so nnd so that Var( θ) Var(ˆθ) 0 Or, use Cov(ˆθ, θ ˆθ) = 0, so that Var( θ) Var(ˆθ) Ch3 Ex6 (a) Follow scheme (ii), set X = , β = β 1 β 4, ɛ = ɛ 1 ɛ 12, Y = y 1 y 12 2

3 5 1 1 ˆβ = (X X) 1 X Y = X Y (b) Write z i for the weighings of scheme (i), while y i is for scheme (ii) Suppose sd(z i ) = σ, then sd(y i ) = 2σ Then, Var( ˆβ (i) ) = 1 3 σ2 and Var( ˆβ (ii) ) = 5 22 σ 2 = 5 6 σ2 Scheme (i) is superior in this case (c) scheme (i) : weigh each item 10 times 1 10 y Y = X = y = I , β = β 1 β 6, X X = (I )(I ) = I = 10I 6 ( 10 ) ˆβ = (X X) 1 X Y = 10 1 I 6 X Y = 1 60 y i,, y i and 10 i=1 i=51 Var( ˆβ 1 ) = 1 10 σ2 scheme(ii) : weigh each pair 4 times X = , scheme(iii) : weigh each triple 3 times X = , X X = 16I 6 + 4J 6, (X X) 1 = 1 16 I (16+4 6) J 6, ˆβ = (X X) 1 X Y, Var( ˆβ 1 ) = σ2 X X = 18I J 6, (X X) 1 = 1 18 I ( ) J 6, ˆβ = (X X) 1 X Y, Var( ˆβ 1 ) = σ2 Since Var(iii) < Var(ii) < Var(i), Scheme (iii) is the best Ch3 Ex18 In general, the question what is the best functional form of the relationship should be answered carefully, and there is no correct (or wrong) answer In this problem, strength of yarn would grow in ratio of fiber length, tensile strength of the fibers, and fiber fineness, their relationship would be a multiple form so X 1 = ax2x b 3X c 4 d Fit this through simple linear regression method; the model we regress is log x i1 = β 1 + β 2 log x i2 + β 3 log x i3 + β 4 log x i4 + ɛ i Since ˆβ 3 is not significant(009126(039551)), after omitting log x 3, fitted result is; lm(formula = lx1 ~ lx2 + lx4) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) ** lx e-05 *** 3

4 lx ** Signif codes: 0 *** 0001 ** 001 * Residual standard error: on 17 degrees of freedom Multiple R-Squared: 07175, Adjusted R-squared: F-statistic: 2158 on 2 and 17 DF, p-value: 2159e-05 We have s 2 = For (a)-(c), with confidence level α = 005, number of points K = 4 and n = 3, confidence intervals are of the form [ŷ ± t se(ŷ)] where t = t 1 α/2,n 3 for (a), t 1 α/2k,n 3 for (b-bonferroni), and 3F 1 α,3,n 3 for (c-scheffe) num ŷ CI for each sci(bonferroni) sci(scheffe) If you want to fit the model without logarithm transformation, the following should be the answer lm(formula = fiber$x1 ~ X2 + X4) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) < 2e-16 *** X e-05 *** X ** Signif codes: 0 *** 0001 ** 001 * Residual standard error: 641 on 17 degrees of freedom Multiple R-Squared: 07453, Adjusted R-squared: F-statistic: 88 on 2 and 17 DF, p-value: 8931e-06 and confidence intervals are given: num ŷ CI for each sci(bonferroni) sci(scheffe) (d) Each interval of Bonferroni is narrower than corresponding interval of Scheffe method Thus, Bonferroni method is preferred Ch3 Ex19 (a) Call: lm(formula = protein ~ L1 + L2 + L3 + L4 + L5 + L6) Residuals: Min 1Q Median 3Q Max

5 Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) * L L L ** L ** L L Signif codes: 0 *** 0001 ** 001 * Residual standard error: on 17 degrees of freedom Multiple R-Squared: 09821, Adjusted R-squared: F-statistic: 1559 on 6 and 17 DF, p-value: 6654e-14 Residual sum of square is 17s 2 = (b)(l1, L2, L5, L6) regression coefficients of which are smaller in magnitude than twice their standard errors respectively Thus we shall keep L3 and L4 only lm(formula = protein ~ L3 + L4) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) < 2e-16 *** L < 2e-16 *** L e-16 *** Signif codes: 0 *** 0001 ** 001 * Residual standard error: 077 on 21 degrees of freedom Multiple R-Squared: 09721, Adjusted R-squared: F-statistic: 3662 on 2 and 21 DF, p-value: < 22e-16 Residual sum of square is 17s 2 = (c) Set H 0 : model in (b) vs H 1 : model in (a) Model 1: protein ~ L3 + L4 Model 2: protein ~ L1 + L2 + L3 + L4 + L5 + L6 ResDf RSS Df Sum of Sq F Pr(>F) P-value is 00918; not quite small enough to reject H 0 Model in (b) is good (d) With confidence level α = 010, number of points K = 6, p = 3 and n =, prediction intervals are of the form [ŷ ± t se(ŷ)] where t = t 1 α/2,n 3 for (i), t 1 α/2k,n 3 for (ii-bonferroni), and 6F 1 α,6,n 3 for (iii-scheffe) 5

6 num ŷ PI for each spi(bonferroni) spi(scheffe) Bonferroni is more preferable Part C Ch3 Ex7 (a) Given level α test in ANOVA model is to reject H 0 if F F I 1,n I,1 α Power function is given by β(δ) = P δ (test rejects H 0 ) = P(F F I 1,n I,1 α ) where F non-central F I 1,n I,δ To get δ, use substitution rule, δ 2 σ 2 = n i (y i ȳ) 2 yi=θ i,ȳ= θ = n i (θ i θ) 2 Now by the following claim, conclude that the power is maximized when δ is the greatest possible ie ni (θ i θ) 2 subject to n i = n is maximized Claim: Let F δ F m,n,δ for some m, n, then P(F δ1 a) P(F δ2 a) if δ 1 δ 2 0 (Heuristic argument: observe that mass of F m,n,δ tends to move toward right when δ increases) proof: Let Z N(0, 1), X χ 2 m 1, Y χ 2 n and independent with each other Write their densities as f Z, f X and f Y resp then if P((Z + δ) 2 a) decreases as δ decreases, P(F δ1 a) = P( (Z + δ 1) 2 + X a n Y m ) = P((Z + δ 1) 2 + X a Y ) = = P(F δ2 a) P((Z + δ 1 ) 2 a y x)f X (x)f Y (y)dxdy P((Z + δ 2 ) 2 a y x)f X (x)f Y (y)dxdy where a = a n m Thus it is left to prove that P((Z + δ) 2 a) decreases as δ decreases Equivalently, P((Z + δ) 2 a) increases as δ decreases, since {(Z + δ) 2 a} = { a δ Z a δ} tends to cover more neighborhood of 0 (where most mass of N(0, 1) is on) when δ decreases One can prove this argument more mathematically (b) When I = 2, n i = na i, a 1 + a 2 = 1, write f(a 1 ) = n i (θ i θ) 2 = a 1 (θ 1 (a 1 θ 1 + (1 a 1 )θ 2 )) 2 + (1 a 1 )(θ 2 (a 1 θ 1 + (1 a 1 )θ 2 )) 2 The fact that f(a 1 ) is concave with peak at a 1 = 1/2 proves the argument (c) Write v = θ 3 θ 2 = θ 2 θ 1 Let B j = a i (θ i θ) 2 where j indicates allocation scheme (i) and (ii) Since δ = n B j, scheme (j) with larger B j gives more powerful test (j = i, ii) Calculation gives B 1 = 2/3v 2, B 2 = v 2 Thus, scheme (ii) gives more power and if we have 3/2n instead of n for total number of samples, then the power would be approximately the same (note that the second df of the F distributions would be different then due to the different sample sizes) 6

7 Appendix: Sample code for Ex18 and Ex19 #### #318 fiber <- readtable("d:/2006-fall/stat664/hw2/fiber2dat") names(fiber) <- c("no","x1","x2","x3","x4") lfiber <- log(fiber) names(lfiber) <- c("no","lx1","lx2","lx3","lx4") X2 <- fiber$x2 - mean(fiber$x2) X3 <- fiber$x3 - mean(fiber$x3) X4 <- fiber$x4 - mean(fiber$x4) rm <- lm(fiber$x1 ~ X2+X4) logrm <- lm(lx1 ~ lx2+lx4, data=lfiber) summary(rm) summary(logrm) #### confidence interval for mean X1 xc <- t(matrix(ncol=4, nrow=3, c(75- mean(fiber$x2),70- mean(fiber$x3),45 - mean(fiber$x4), 80- mean(fiber$x2),70- mean(fiber$x3),45 - mean(fiber$x4), 80- mean(fiber$x2),75- mean(fiber$x3),42 - mean(fiber$x4), 65- mean(fiber$x2),80- mean(fiber$x3),40 - mean(fiber$x4)))) xc<-dataframe(xc[,c(1,3)]) names(xc) <- c("x2","x4") a <- 005 ## 1-a prediction intervals K <- 4 ## number of points p <- predict(rm, xc,sefit=t) ## prediction PI_each <- cbind(p$fit - qt(1-a/2,rm$dfresidual)*p$sefit, p$fit + qt(1-a/2,rm$dfresidual)*p$sefit) PI_Bonf <- cbind(p$fit - qt(1-a/(2*k),rm$dfresidual)*p$sefit, p$fit + qt(1-a/(2*k),rm$dfresidual)*p$sefit) PI_Sche <- cbind(p$fit - sqrt(3*qf(1-005,3,rm$dfresidual))*p$sefit, p$fit + sqrt(3*qf(1-005,3,rm$dfresidual))*p$sefit) #### confidence interval for logx1 xc <- t(matrix(ncol=4, nrow=3, c(75,70,45, 80,70,45, 80,75,42, 65,80,40))) logxc <- log(xc) xc<-dataframe(logxc[,c(1,3)]) names(xc) <- c("x2","x4") a <- 005 ## 1-a prediction intervals K <- 4 ## number of points p <- predict(logrm, xc,sefit=t) ## prediction PI_each <- cbind(p$fit - qt(1-a/2,logrm$dfresidual)*p$sefit, p$fit + qt(1-a/2,logrm$dfresidual)*p$sefit) PI_Bonf <- cbind(p$fit - qt(1-a/(2*k),logrm$dfresidual)*p$sefit, p$fit + qt(1-a/(2*k),logrm$dfresidual)*p$sefit) PI_Sche <- cbind(p$fit - sqrt(3*qf(1-005,3,logrm$dfresidual))*p$sefit, p$fit + sqrt(3*qf(1-005,3,logrm$dfresidual))*p$sefit) 7

8 #### #319 gw <- readtable("d:/2007-fall/664 - solution/hw2/proteindat") names(gw) <- c("no","protein","l1","l2","l3","l4","l5","l6") attach(gw) #par(mfrow=c(2,3)) #### (a) Fit linear model with all covariates fm <- lm(protein~l1+l2+l3+l4+l5+l6) summary(fm) ####(b) ## (L1, L2, L5, L6) regression coefficients of which are smaller in magnitude ## than twice their standard errors resp rm <- lm(protein~l3+l4) summary(rm) ####(c) anova(rm,fm) ## H0 : rm ## H1 : fm ## p-value is 00918; not quite small enough to reject H0(rm) ## model in (b) (rm) is good #### confidence interval for rm in (b) a <- 010 ## 1-a prediction intervals K <- 6 ## number of points pt <- readtable("d:/2007-fall/664 - solution/hw2/reflectdat") pt_rm<-pt[,4:5] names(pt_rm)<-c("l3","l4") ####(d)-i p <- predict(rm, dataframe(pt_rm),sefit=t) ## prediction sep <- p$residualscale * sqrt(1+(p$sefit/p$residualscale)^2) ##se for PI p$fit PI_each <- cbind(p$fit - qt(1-a/2,rm$dfresidual)*sep, p$fit + qt(1-a/2,rm$dfresidual)*sep) PI_Bonf <- cbind(p$fit - qt(1-a/(2*k),rm$dfresidual)*sep, p$fit + qt(1-a/(2*k),rm$dfresidual)*sep) PI_Sche <- cbind(p$fit - sqrt(6*qf(1-005,6,rm$dfresidual))*sep, p$fit + sqrt(6*qf(1-005,6,rm$dfresidual))*sep) 8

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus