STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

(out of 15 ponts) STAT 3340 Assgnment 1 solutons (10) (10) 1. Fnd the equaton of the lne whch passes through the ponts (1,1) and (4,5). β 1 = (5 1)/(4 1) = 4/3 equaton for the lne s y y 0 = β 1 (x x 0 ), where (x 0, y 0 ) s a pont on the lne. Usng the pont (1,1), the equaton s y 1 = 4(x 1), or y = 4x 1. Could also use the pont (4,5), and 3 3 3 would get the same equaton.. Suppose you are gven three data ponts (1,4), (,6) and (3,7) and the lne y = 1 + 4x. Gve the three resduals and ther sum of squares. (5 ponts for resduals, 5 ponts for resdual sum of squares) > x=c(1,,3) > y=c(4,6,7) > yhat=1+4*x > yhat # predcted values [1] 5 9 13 > resds=y-yhat #resduals > resds [1] -1-3 -6 > sum(resds^) # resdual sum of squares [1] 46 3. Some data gves the summares: n = 10, x y = 100, x = 0 and y = 10. Suppose that the response y s temperature n degrees Celcus. (3) (3) (4) (a) What s S xy? S xy = x y n xȳ = 100 10(0/10)(10/10) = 80. (b) If the response was converted to temperature n degrees Fahrenhet y, so that y = 3 + 1.8y, what s y?. 10 10 10 y = (3 + 1.8y ) = 3(10) + 1.8 y = 30 + 1.8(10) = 338 (c) If the response was converted to temperature n degrees Fahrenhet y, so that y = 3 + 1.8y, what s S xy? x y = x (3 + 1.8y ) = 3 x + 1.8 x y = 3(0) + 1.8(100) = 80 Then S xy = x y ( 10 x 10 y )/10 = 80 (0)(338)/10 = 144. 1

4. In a smple lnear regresson, the sum of squares functon s S(β 0, β 1 ) = 1000 100β 0 700β 1 + 100β 0 β 1 + 50β 0 + 70β 1. Fnd the least squares values for β 0 and β 1. Frst dfferentate wrt β 0 and β 1. (5 ponts for dervatves) S β 0 = 100 + 100β 1 + (50)β 0 S β 1 = 700 + 100β 0 + (70)β 1 Settng the partal dervatves eqch equal to 0 gves the followng two equatons, after a bt of smplfyng: β 1 + β 0 = 1 = β 0 = 1 β 1 1.4β 1 + β 0 = 7 Substtutng the frst equaton nto the second, 1.4β 1 +(1 β 1 ) = 7 =.4β 1 = 6 = β 1 = 15. Substtutng ths back nto the frst equaton gves β 0 = 1 β 1 = 14. The least squares soluton s ( 14, 15). (5 ponts for soluton)

5. A random sample of 11 elementary school students s selected, and each student s measured on a creatvty score (x) usng a well-defned testng nstrument and on a task score (y) usng a new nstrument. The task score s the mean tme taken to perform several hand-eye coordnaton tasks. The data are: STUDENT CREATIVITY(X) TASKS(Y) FR 35 3.9 HT 37 3.9 IO 50 6.1 DP 69 4.3 YR 84 8.8 QD 40.1 DF 9 5.7 ER 4 3.0 RR 51 7.1 TG 45 7.3 EF 31 3.3 Use R to do the followng questons. Show your commands. Make sure your output s ntegrated nto your responses. (Cut and paste as necessary.) (a) Plot Tasks versus Creatvty and comment on the form and strength of the assocaton. Be sure to label the axes. To make the plot, use: x=c(35,37,50,69,84,40,9,4,51,45,31) y=c(3.9,3.9,6.1,4.3,8.8,.1,5.7,3.0,7.1,7.3,3.3) plot(x,y,xlab="creatvty",ylab="tasks") Plot s below, together wth added least squares lne. (b) Calculate the summares S xx, S xy, S yy and X and Ȳ. (1 pont for each of the 5 summary statstcs) > x=c(35,37,50,69,84,40,9,4,51,45,31) > y=c(3.9,3.9,6.1,4.3,8.8,.1,5.7,3.0,7.1,7.3,3.3) > xbar=mean(x) > xbar [1] 46.63636 > ybar=mean(y) > ybar [1] 5.045455 > Sxx=sum((x-mean(x))^) > Sxx [1] 778.545 > Syy=sum((y-mean(y))^) > Syy [1] 44.077 3

() (4) > Sxy=sum((x-mean(x))*(y-mean(y))) > Sxy [1] 01.5818 (c) Use these data summares to calculate the correlaton coeffcent. Does the value agree wth your vsual assessment n (a)? ( ponts for the correlaton coeffcent). > corrxy=sxy/sqrt(sxx*syy) > corrxy [1] 0.5763439 from the plot followng, t looks lke there s a moderately strong ncreasng relatonshp between x and y, and ths s born out by the moderate sze of r. (d) Use these summares to calculate the least squares values for the ntercept and slope. ( ponts each for ˆbeta 1 and ˆbeta 0 ) > b1=sxy/sxx #estmated slope > b1 [1] 0.075494 > b0=ybar-b1*xbar #estmated ntercept > b0 [1] 1.66014 (e) Add the least squares lne to the plot n (a). (A convenent way to add the lne s the command ablne(ntercept,slope) (5 ponts for the plot wth added lne) > x=c(35,37,50,69,84,40,9,4,51,45,31) > y=c(3.9,3.9,6.1,4.3,8.8,.1,5.7,3.0,7.1,7.3,3.3) > plot(x,y,xlab="creatvty",ylab="tasks") > ablne(b0,b1) 4

Tasks 3 4 5 6 7 8 9 30 40 50 60 70 80 Creatvty 5

(6) (f) Obtan the resduals, e = y ŷ. Calculate ther sample mean to verfy t s zero, and the correlaton wth X to verfy t s also zero. ( ponts for the resduals, ponts for showng the mean of resduals s 0, ponts for showng correlaton of resduals and x s 0.) (g) Plot the resduals versus X. Do the resduals look random? (5 ponts for the resdual plot) > yhat = b0+b1*x #predcted values > resds=y-yhat #resduals > resds [1] -0.301433-0.446341 0.8105156 -.367930 1.0438359 -.4639903 [7] 1.9340531-1.7090891 1.737966.37367-0.6110457 > prnt(mean(resds)) #0 to round off error [1] 4.04064e-17 > prnt(cor(x,resds)) #0 to round off error [1] -1.03577e-16 6

> #<<fg=t,echo=true,keep.source=t>>= > plot(x,resds,man="resdual plot", xlab="x",ylab="resduals") resdual plot resduals 1 0 1 30 40 50 60 70 80 x The resduals look random, wth no evdence that mean or varance change wth x. 7

(6) () (h) Obtan the resdual, regresson and total sums of squares, usng the data summares. ( ponts for each of SSE, SST, SSR) > SSE=sum(resds^) #resdual sum of squares > SSE [1] 9.4063 > SST=Syy #total sum of squares > SST [1] 44.077 > SSR=SST-SSE #regresson sum of squares > SSR [1] 14.6464 > b1*sxy #another way to get the regresson sum of squares [1] 14.6464 () What s the value of the coeffcent of determnaton? > R=corrxy^ > R [1] 0.33173 8

6. Use the data summares calculated for the prevous queston and the formulae from the book or notes to do the followng questons. (6) () () (a) Assess the null hypothess that there s no relatonshp between task score and creatvty. Use a test based on the normal assumpton. State the hypotheses, show calculaton of the test statstc, calculate the P value and draw a concluson. ( ponts for hypotheses, ponts for observed test statstc, ponts for p-value) H 0 : β 1 = 0, H 0 : β 1 0 > MSE=SSE/(11-) > tobs=b1/sqrt(mse/sxx) > tobs [1].115781 > pvalue=*(1-pt(tobs,11-)) > pvalue [1] 0.06347107 (b) Calculate the 95% confdence nterval for the mean task score when the creatvty s 50. ( ponts for CI) > c(b0+b1*50 - qt(.975,11-)*sqrt(mse)*sqrt(1/11+(50-xbar)^/sxx), + b0+b1*50 + qt(.975,11-)*sqrt(mse)*sqrt(1/11+(50-xbar)^/sxx)) [1] 4.09361 6.549608 (c) Calculate the 95% predcton nterval for a new value for task score when the creatvty s 50. ( ponts for predcton nterval) > c(b0+b1*50 - qt(.975,11-)*sqrt(mse)*sqrt(1+1/11+(50-xbar)^/sxx), + b0+b1*50 + qt(.975,11-)*sqrt(mse)*sqrt(1+1/11+(50-xbar)^/sxx)) [1] 1.01091 9.568047 9

7. Suppose X and Y are random varables wth µ x = 10, σ x = 3, µ y = 4, σ y = 1, and Cov[X, Y ] = 1.5. Calculate: () (4) (4) (a) E[X Y ] E[X Y ] = E[X] E[Y ] = (10) 4 = 16 (b) V ar[x Y ] V [X Y ] = V [X] Cov[X, Y ] + V [Y ] = 4V [X] 4Cov[X, Y ] + V [Y ] = 4(3 ) 4(1.5) + 1 = 31 (c) Cor[X, Y ], where Cor stands for correlaton. Cor[X, Y ] = Cov[X,Y ] = Cov[X,Y ] = 1.5/(3(1)) =.5 V [X]V [Y ] 4V [X]V [Y ] 8. Suppose you have data (x, y ), = 1,..., n and want to ft the lne wth known ntercept, y = + β 1 x + ɛ wth the usual assumptons about ɛ. The least squares estmate for β 1 s ˆβ 1 = x y x. x (a) Fnd the expected value of ˆβ 1. Is ˆβ 1 unbased? (5 ponts for the expected value) E[ ˆβ 1 ] = x E[y ] x x = x ( + β 1 x ) x x = x (β 1 x ) x = β 1 x x = β 1 β 1 s unbased. (b) Fnd the varance of ˆβ 1. (5 ponts for the varance) V [ ˆβ 1 ] = x V [y ] ( = V [Y x ] ) x σ = ) ( x x 9. Suppose you have data (x, y ), = 1,..., n and want to ft the lne wth known slope equal to 1 y = β 0 + x + ɛ (10) Derve the least squares estmator of β 0. The error sum of squares s S(β 0 ) = n (y β 0 x ). Dfferentatng wth respect to β 0 gves d dβ 0 S(β 0 ) = n (y β 0 x ) = Settng the dervatve equal to zero and solvng gves β 0 = ȳ x n y + nβ 0 + n x 10

10. An experment s to be run to determne the lnear assocaton between x and y. Two possble arrangements of the x values are proposed (6) (4) (a) x = (1, 1, 1, 1, 1, 10, 10, 10, 10, 10) and (b) x = (1,, 3, 4, 5, 6, 7, 8, 9, 10).. Calculate S xx for each proposal. (3 ponts for each of the sums of squares.) > x1=c(1,1,1,1,1,10,10,10,10,10) > SSx1=sum((x1-mean(x1))^) > SSx1 [1] 0.5 > x=c(1,,3,4,5,6,7,8,9,10) > SSx=sum((x-mean(x))^) > SSx [1] 8.5. Whch arangement wll lead to the most precse (.e. smallest varance) estmator of the slope? Justfy your answer. (4 ponts for a reasonable justfcaton.) The varance of β σ 1 s gven by SS XX where SS XX s the sum of squares of the x s. The frst confguraton has a larger value of the X sum of squares, so a smaller value of V [ β 1 ], whch means a more precse estmator. 11