STATISTICS QUESTIONS. Step by Step Solutions.

STATISTICS QUESTIONS Step by Step Solutons www.mathcracker.com 9//016

Problem 1: A researcher s nterested n the effects of famly sze on delnquency for a group of offenders and examnes famles wth one to four chldren. She obtans a sample of 16 famles, four of each sze, and dentfes the number of arrests per chld for delnquency. The data s as follows: Group 1 chldren n= Group 3 chldren n= Group 3 chldren n= Group 1 chld n= Famly 1 10 8 5 Famly 8 8 6 5 Famly 3 9 6 7 Famly 10 9 9 a) Calculate the total sum of squares. b) Calculate the mean square (between groups). c) Calculate the F-rato d) Use the Turkey HSD (alpha=0.05) to test for sgnfcance between groups. Whch groups dffered? e) Based on your results, wrte a 1- paragraph essay that descrbes your observatons obtaned from ths sample n regard to the effects of famly sze on delnquency for a group of offenders. Soluton: (a) The followng table wth descrptve statstcs s obtaned from the nformaton provded

Obs. Group 1 Group Group 3 Group 10 8 5 8 8 6 5 9 6 7 10 9 9 Mean 9.5 7.75 6.75 3.5 St. Dev. 0.957 1.58 1.708 1.5 We need to test H : 0 1 3 H A : Not all the means are equal Wth the data found n the table above, we can compute the followng values, whch are needed to construct the ANOVA table. We have: Between 1 k SS n x x from whch we get

SS 9.5 6.75 7.75 6.75 6.75 6.75 3.5 6.75 78 Between Now we also see that, k 1 SS n s Wthn 1 whch mples Wthn SS 1 0.957 1 1.58 1 1.708 1 1.5 3 Hence, SS Total = 78+3 = 101 (b) Therefore MS Between SSBetween 78 6 k 1 3 Also, we obtan that

MS Wthn http://www.mathcracker.com SSWthn 3 1.917 N k 1 1 (c) Therefore, the F-statstcs s computed as F MSBetween 6 13.565 MS 1.917 Wthn The crtcal value for 0.05, df1 3 and df 1 s gven by F 3.903 C and the correspondng p-value s p Pr F 13.565 0.000 3,1 Observed that the p-value s less than the sgnfcance level 0.05, then we reject H 0. (d) The HSD dfference s computed as follows: MSE 1.917 HSD Q*.0.91 n

The followng table s obtaned: Post hoc analyss Tukey smultaneous comparson t-values (d.f. = 1) Group Group 3 Group Group 1 3.3 6.8 7.8 9.3 Group 3.3 Group 3 6.8 3.58 Group 7.8.60 1.0 Group 1 9.3 6.13.55 1.53 crtcal values for expermentwse error rate: 0.05.97 0.01 3.89 (e) Based on the above results, we have enough evdence to reject the null hypothess of equal means, at the 0.05 sgnfcance level. Summarzng, we have the followng ANOVA table: Source SS df MS F p-value Crt. F Between Groups Wthn Groups 78 3 6 13.565 0.000 3.903 3 1 1.917 Total 101 15

The parwse dfferences that are sgnfcant are between Group 1 and Group, Group and Group, and Group 3 and Group. In fact, the mean for Group s sgnfcantly lower when compared to the means for groups 1, and 3, respectvely. Problem : Move Success. Usng the data n Table 7., make a scatter dagram for the relatonshp between producton budget and vewer ratng of moves. Estmate the correlaton coeffcent. Based on these data, do you thnk a large producton budget s lkely to result n a move wth a hgh vewer ratng? Explan. Soluton: The scatter plot s shown below. Scatterplot of Ratng vs Budget 9 8 Ratng 7 6 5 0 50 100 Budget 150 00 It seems lke there's a mld negatve lnear relatonshp between Budget and Ratng. The actual correlaton coeffcent s computed as

As predcted by the vsual trend, the correlaton s negatve, but snce t's very small, the relatonshp s farly weak. Ths means that s not certan that a larger budget wll produce a hgher ratng, as t's not certan that a larger budget wll produce a lower ratng, but there a nclnaton to have lower ratng wth hgher budgets. Problem 3: Whch of these models s a better representaton of the relatonshp between students age and startng salary? Explan your decson. Soluton: As mentoned n the prevous part, the model obtaned once the outler was elmnated s relatvely smlar to the model wth n=5 cases, as the regresson coeffcents don't change dramatcally. But stll ths relatvely small dfference n coeffcents makes a relatvely large dfference n R^. In fact, for the model wth n = 5 we get R = 0.33, and for the model wth n = 5 we get R = 0.7. Ths makes the second model (wth n = ) the preferred one. The preferred model s Startng Salary^ = -67,91.785 + 3,635.6857* Age Problem :

Compute an mprsonment rate per 1000 populaton for 000. Introduce ths ncarceraton rate as an ndependent varable nto the model run n Part B. Test the hypothess that the R squared =0. Does ths model ft the data better than the model n Part B above? Explan. Does each of the ndependent varables have a statstcally sgnfcant effect on homcde? Explan. How strong s the effect of each of the ndependent varables? Explan. Whch of the ndependent varables has the stronger effect on the homcde rate? Explan. Soluton: The new varable s computed as ImprPer1000 = Prson0/pop0 (let us recall that pop0 s already gven n 1000 s). The followng s obtaned wth Excel: Regresson Analyss R² 0.99 Adjusted R² 0.66 n 9 R 0.707 k 3

Std. Error 1.639 Dep. Var. homrt0 ANOVA table Source SS df MS F p-value Regresson 10.81 3 0.19 1.95 6.87E-07 Resdual 10.853 5.6855 Total 1.935 8 Regresson output confdence nterval varables coeffcents std. error t (df=5) p-value 95% lower 95% upper std. coeff. Intercept -1.951 1.171-1.660.100 -.3059 0.156 0.000 ImprPer1000 0.30 0.1707 1.3.1616-0.1009 0.5869 0.175 sglmom80.3703 5.6975.77.0001 1.899 35.858 0.539 unempl0 36.9631 7.997 1.35.185-18.015 91.976 0.150 The model s Homcde Rate n 000 = -1.951 + 0.30* ImprPer1000 +.3703* sglmom80 + 36.9631* unempl0 Notce that the model s sgnfcant overall, snce F(3, 5) = 1.95, p = 0.000000687 < 0.05, so then R s sgnfcantly greater than zero.

Ths model fts only slghtly better than the prevous one, snce now Adj. R = 0.66, whch means that n ths case the amount of explaned varaton n the response varable by ths model s 6.6%. Notce that n ths model, the varable sglmom80 s ndvdually sgnfcant, wth t =.77 and p = 0.0001 < 0.05, but the varable uempl0 s not ndvdually sgnfcant, t = 1.35, p = 0.185 > 0.05. The varable ImprPer1000 s not sgnfcant ether, snce t = 1.3, p = 0.1616 > 0.05. The effect of ImprPer1000 and uempl0 s qute moderate snce the standardzed coeffcents assocated to them are less than 0. (ths s, an ncrease n one standard devaton n ether of the varables brngs a change of less than 0. standard devatons n the response varable). The varable wth the strongest effect s sglmom80, wth a standardzed coeffcent of 0.539. Problem 5: Usng the data below, answer the followng questons usng a table format. X 6 3 7 y 5-1 a. x b. y c. x y 1 1 1 d. Show that x. y x y 1 1 1 x y x y x x 1 1 1 1 1 e. f. g. h. Show that. Show that ( x x ) ( y y ) 0 1 1

Soluton: We have: http://www.mathcracker.com X Y X^ Y^ X*Y 5 16 5 0 6 36 1 3-1 9 1-3 7 9 16 8 Sum = 0 10 110 6 57 (a) 1 x 0 (b) 1 y 10 (c) 1 x y 57 (d) Notce that 1 1 1 1 x y 57, and x y 0 10 00 1 1 x y x y n ths case., whch means that

(e) 1 x 110 http://www.mathcracker.com (f) 1 y 6 (g) x y 57 1 39 (h) x 1 0 00, and 1 x 110, so then x 1 1 x () we get that X 5, Y.5. Observe that X Y X-Xbar Y-Ybar 5-1.5 6 1-0.5 3-1 - -3.5 7 1.5 Sum = 0 0 so then

( x x ) ( y y ) 0 1 1 http://www.mathcracker.com