NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION 014-015 MTH35/MH3510 Regresson Analyss December 014 TIME ALLOWED: HOURS INSTRUCTIONS TO CANDIDATES 1. Ths examnaton paper contans FOUR (4) questons and comprses S IX (6) prnted pages, ncludng two appendces (page 5 and 6).. Answer all questons. The marks for each queston are ndcated at the begnnng of each queston. 3. Answer each queston begnnng on a FRESH page of the answer book. 4. Ths IS NOT an OPEN BOOK exam. 5. Canddates may use calculators. However, they should wrte down systematcally the steps n the workngs.
QUESTION 1. (3 marks) Suppose one wshes to study the e ectveness of a mathematcal clnc center n mprovng the undergraduate experence. To ths end, the center keeps a complete log of all users and the duraton of each vst. The researchers select 5 students regstered n mathematcal courses aganst the center log to see the hours spent (f any) and to ask them to rate the overall value of ther learnng experence on a scale from 0 to 100. The 5 students on average spent.03 hours at the center wth astandarddevatonof4.56hours. On average they gave a ratng of 53.43 wth astandarddevatonof9.38. The correlaton coe cent between hours spent and ratng s 0.41. () Fnd a regresson model for the varables of nterest and estmate the parameters. () Is the regresson coe cent sgnfcant at 5% sgnfcance level (t 0.05 1 =1.96,t 0.05 1.98,t 0.01 1 =63.6)? () Construct an ANOVA table to test the sgnfcance on the relatonshp between the response varable and predctor varables at the 5% level. (v) Estmate the ncrease n the expected number of ratng when there are more hours spent at the center by student A than student B. Fnd ts 95% confdence ntervals (t 0.05 1 =1.96,t 0.05 10 =1.98,t 0.01 1 =63.6). Soluton () From the data we see that x =.03, S xx = X (x x) =4.56 5 = 4678.56, ȳ =53.43, S yy = X (y ȳ) =9.38 5 = 19796.49. 10 = Note that whch mples that r xy = S xy S xx S yy, r xy =0.41, S xy = r xy S xx S yy =0.41 4678.56 19796.49 = 1556965 It follows that ˆ1 = S xy S xx = 3945.791 4678.56 =0.8434
and that b 0 =ȳ b 1 x =53.43 0.8434.03 = 51.7179. The ftted model s y =51.7179 + 0.8434x. () Note that SSR = b 1 S xx =0.8434 4678.56 = 337.97. Ths ensures that SSE = S yy SSR =19796.49 337.97 = 16468.5 and that s = SSE/(n ) = 16468.5/3 = 73.84987. Hence t = ˆ1 q s S xx = 0.8434 q 73.84987 4678.56 = 0.8434 0.156373 =6.71975 >t0.05 1 =1.96. We then reject the null hypothess and the regresson coe cent s sgnfcant. () The ANOVA table s as follows Source df SS MS F p-value 337.97 Regresson 1 SSR =337.97 337.97 =45.06399 73.84987 Resdual 3 SSE =16468.5 73.84987 Total 4 S yy =19796.49 We see that F = SSR s = 337.97 73.84987 =45.06399 > (t0.05 1 ) =3.8416 and agan reject the null hypothess and the regresson coe cent s sgnfcant. (v) Note that x 1 x =andtheestmatorofthencreases Ey 1 Ey =0.8434 =1.6868. The predcton nterval s s ŷ 0 ± s ( 1 n + (x r 0 x) )t 1 (.03) n =1.6868 ± 8.593595 + 1.96 S xx 5 4678.56 =(0.563879,.80971). 3
QUESTION. Consder the general lnear regresson stuaton wth a 0 n the model: (18 marks) () Verfy that the correlaton between the vectors e and Y s (1 e =(e 1,, e p )wthe = y ŷ and Y =(y 1,,y p ). R ) 1/ where () Can we fnd detectve regressons by a plot of resduals e aganst observatons y? Justfy your answer. () Fnd the correlaton between e and Ŷ. Soluton () Observe that X (e ē)(y ȳ)= X e (y ȳ)= X e y = e 0 Y = e 0 e because ē =0fa 0 term s n the model and e 0 e = Y 0 (I H)(I H)Y = Y 0 (I H)Y = Y 0 e. Moreover we have It follows that r xy = (e 0 e P X (e ē) = X e = e 0 e. e 0 e (y ȳ ) ) = e 0 e 1/ P =(1 R ) 1/. 1/ (y ȳ) () No, we can not fnd detectve regressons by a plot of resduals e aganst observatons y because there always shows a slope. () Wrte X (e ē)(ŷ ˆȳ) = X e ŷ = e 0 Ŷ = Ŷ 0 (I H) 0 HY = Ŷ 0 (H H )Y = 0 so that the correlaton s zero. 4
QUESTION 3. (30 marks) The dataset n Appendx II contans the prce per capta of pork annually from 1936-195 together wth other varables relevant to the prce of the pork. A multple lnear regresson model s proposed to descrbe the relatonshp between the response varable PP (prce of pork) and the other 5 predctors varables (x 1,x,x 3,x 4,x 5 ). However, a farmer beleves that the varaton n PP can be adequately explaned by the varable x 4 alone and therefore proposes a smple lnear regresson model! for the data. Fttng and! separately to the dataset yelds the followng table. Model Model! SSE 133.1769? SSR 63.48848 457.9568 () Wrte down the full model and reduced model!. () Calculate the SSE, denoted by the queston mark n the above table, for fttng!. () Fttng produces the estmators of the followng regresson coe cents: Intercept x 1 x x 3 x 4 x 5-3704.869.147-0.866.96-3.961-1.859 Predct the value of PP n the year 006 based on the model when x = 98,x 3 =100,x 4 =100,x 5 =110. (v) Is the farmer s belef correct at the 5% sgnfcance level? Justfy your answer (F4,11 0.95 =3.36,F5,11 0.95 =3.0,F4,1 0.95 =3.6,F5,1 0.95 =3.11). Assume that the varable x 1 (the varable of year) s categorzed nto 3 levels: (1) 1936-1940; () 1941-1950; (3) 1951-195. Suppose that s an adequate model after categorzaton of x 1. (v) Defne dummy varables to represent the categorzed x 1 varable. (v) Propose a test statstc to examne whether PP changes sgnfcantly wth tme under. State clearly the dstrbuton and the parameters of the test statstc (such as degree of freedom). Soluton 5
() The full model s y = 0 + 1 x 1 + x + 3 x 3 + 4 x 4 + 5 x + " and the reduced model s y = 0 + 4 x 4 + " () SSE w =133.1769 + 63.48848 457.9568 = 307.7355. () The pont predctor s y 006 = 3704.869+.147 006 0.866 98+.96 100 3.961 100 1.859 110 = 16.616. (v) The extra SSE s The statstc s F = SSEEXT =307.7355 133.1769 = 174.568. 174.568/4 133.1769/(17 5 1) =3.6047 <F 4,11 =3.36 whch mples that we can not reject the null hypothess. The farmer s belef s correct based on the current data. (v) The two dummy varables are as follows. 8 >< 1 level1 I 1 = 0 level >: 0 level3 (v) In ths case the full model s 8 >< 0 level1 I = 1 level >: 0 level3 y = 0 + 1 I 1 + I + x + 3 x 3 + 4 x 4 + 5 x + " and the reduced model s y = 0 + x + 3 x 3 + 4 x 4 + 5 x + ". 6
The proposed F statstc s F = MSEXT MSE F k 1,n p k, where MSEXT =(SSE! SSE /(k 1) and MSE = SSE /n p k wth k =3andp =4. QUESTION 4. (0 marks) Astudywasconductedtocomparethee ectvenessoftwod erentmedcatons for treatng ndvduals wth hgh blood pressure. To control for unknown sources of varaton, ten patents were assgned at random to each of the two medcatons. The response s a coded measure of the decrease n the dastolc blood pressure measurement after a specfed perod. In ths study, treatment 1 s a standard medcaton and another s a new expermental medcaton. The data, y r, =1, and r =1,, 10 are shown n the table below. Observatons Trt.1 Trt. Observatons Trt.1 Trt. 1 1.5 3.5 6 0.6 3. 0..7 7-0.5 4.3 3-0.. 8 1.1 1.3 4.1 1.6 9-1. 1.5 5-1. 1.7 10 1..5 Does the new medcaton have some mprovement over the standard treatment? () Propose two statstcal approaches for the above queston that can be tested by the data above. () Answer the above queston and justfy t (at 5% level). () Fnd the 95% confdence nterval to estmate the d erence between two medcatons. (F1,18 0.95 =4.41,F1,0 0.95 =4.35, F15, 0.95 =19.43.) Soluton () One way anova model and t-test. 7
() Below we use one way anova model. From the data we see that and ȳ = 8.1 0 =1.405,c.f.= n(ȳ ) =0 (1.405) S yy = X,j y j c.f. =1.88 + 68.75 0 (1.405) =4.1495. Also SST = X y /n c.f. =(3.6 +4.5 )/10 0 (1.405) =1.8405. It follows that SSE = S yy SSR =4.1495 1.8405 = 0.309 and the F statstc s F = 1.8405/1 0.309/18 =19.35737 >F0.95 1,18 =4.41. Ths mples that the new medcaton has some mprovement over the standard treatment. () Note that ȳ 1 =3.6/10 = 0.36, ȳ =4.5/10 =.45. The pont the estmator of the d erence between two medcatons s ȳ 1 ȳ =.09. It follows that the confdence nterval s (.09 ± p r r 0.309 1 4.11 18 10 + 1 )=( 10 3.087568, 1.0943). END OF PAPER 8
Appendx Formulae for the fnal examnaton Smple lnear regresson b 0 =ȳ b 1 x, b 1 = S xy, var( S b 0)= ( 1 xx n + x ), var( S b 1)=, xx S s xx s.e.( b 0 + x 0 b 1 1 )=s n + (x 0 x), SSR = S b 1 S xx,rxy = S xy xx S xx S yy s Predcton nterval for y 0 at x = x 0 s ŷ 0 ± s Predcton nterval for Ey 0 at x = x 0 s ŷ 0 ± s Multple lnear regresson b =(X 0 X) 1 X 0 Y, var( b )= (X 0 X) 1, (1 + 1 n + (x 0 x) s S xx ( 1 n + (x 0 x) S xx )t n )t n SSE = Y 0 [I X(X 0 X) 1 X 0 ]Y, S yy = Y 0 Y nȳ,r = SSR S yy One-way ANOVA var(ȳ.) = SSE = n j, SST = rx Xn =1 j=1 (Ȳ. rx Xn (Y j Ȳ. ), S yy = =1 j=1 Two-way ANOVA (equal sample szes) Ȳ.. ) = rx =1 n (Ȳ. rx Xn (Y j Ȳ.. ) =1 j=1 Ȳ.. ), var(ȳ..) = bn, var(ȳ.j.) = an SS A = X X X (Ȳ.. Ȳ... ) = nb X (Ȳ.. Ȳ... ) = nb X j k SS B = X X X (Ȳ.j. Ȳ... ) = na X (Ȳ.j. Ȳ... ) = na X j k j j SSE = X X X (Ȳj. Ȳ... ), j k rx Xn SS AB = n j. Ȳ.. Ȳ.j. + =1 j=1(y Ȳ...) = n X X bj j b, b j, 9
Appendx II x_1 x_ x_3 x_4 x_5 PP 1936 65.8 51.4 90.9 68.5 59.7 1937 68.0 5.6 9.1 69.6 59.7 1938 65.5 5.1 90.9 70. 63.0 1939 64.8 5.7 90.9 71.9 71.0 1940 65.6 55.1 91.1 75. 71.0 1941 6.4 48.8 90.7 68.3 74. 194 51.4 41.5 90.0 64.0 7.1 1943 4.8 31.4 87.8 53.9 79.0 1944 41.6 9.4 88.0 53. 73.1 1945 46.4 33. 89.1 58.0 70. 1946 49.7 37.0 87.3 63. 8. 1947 50.1 41.8 90.5 70.5 68.4 1948 5.1 44.5 90.4 7.5 73.0 1949 48.4 40.8 90.6 67.8 70. 1950 47.1 43.5 93.8 73. 67.8 1951 47.8 46.5 95.5 77.6 63.4 195 5. 56.3 97.5 89.5 56.0 ; 10