Statistics II Final Exam 26/6/18

Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the town. In partcular, t s nterested n testng f the yearly amount of unused food does not exceed a thousand klos per restaurant. The town hall has collected nformaton from a smple random sample of 10 restaurants, obtanng an average of 1.15 thousand klos of leftover food per year, and a quas-standard devaton of 150 klos. Assume that the amount of unused food follows a normal dstrbuton. (a) (1 pont) Help the town hall to test the precedng statement on the leftover food at restaurants, at a sgnfcance level of 5%. Buld the rejecton regon for ths test and comment on your concluson. (b) (0.5 ponts) Compute the p-value for the test and comment on your result. (c) (1 pont) Obtan the power of the test when the true value of the average yearly unused food s 1.1 thousands of klos. (d) (0.5 ponts) Comment f the followng statements are true or false. Justfy your answers only for the false statements: Soluton: For a (true) value of the unused food, and keepng everythng else constant, f the number of restaurants n the sample ncreases then the power s mproved. If the test s conducted at a sgnfcance level of 1%, nstead of 5%, the p-value obtaned for the test changes. The probablty mass n the crtcal (or rejecton) regon s 1 α. (a) Let X = yearly unused food n a restaurant, and µ = E[X] ts average. The test we wsh to conduct s H 0 : µ 1 H 1 : µ > 1. Assume that X follows a normal dstrbuton and we have a s.r.s. The test statstc s: T = X µ S/ n t n 1, where S denotes the sample quas-standard devaton and n = 10. The rejecton regon wll be gven by RR α = {(x 1,..., x n ) : t > t n 1;α }. In ths case α = 0.05 and t n 1;α = t 9;0.05 = 1.833. The value of the test statstc for ths sample wll be t = 1.15 1 0.15/ 10 = 3.162. Ths value les wthn the rejecton regon. Thus, we conclude that we have suffcent evdence to reject the null hypothess, and to support the statement that the average amount of unused food per year and per restaurant s larger than a thousand klos. (b) The p-value for ths test wll be gven by p-value = P (T > t) = P (t 9 > 3.162) [0.005; 0.01] (a more precse value s 0.0058). Ths p-value s very low (smaller than 1%), provdng a compellng ndcaton to reject H 0. 1

(c) To obtan the power of the test, power(µ), for µ = 1.1, we work wth the defnton of the power as the probablty of rejectng H 0 when t s not true, power(1.1) = P (reject H 0 µ = 1.1) = P (T > 1.833 µ = 1.1) X µ0 = P S/ > 1.833 µ = 1.1 = P X > 1.087 µ = 1.1 n = P t 9 > 1.087 µ 0.15/ 10 µ = 1.1 1.087 1.1 = P t 9 > 0.0474 = P (t 9 > 0.274) [0.5; 0.75], or wth greater precson the power s 0.605. In the precedng argument we have used that, assumng the correct value for µ s 1.1, t holds (d) The correct answers are: X 1.1 S/ n t 9. True. The power ncreases wth the sample sze. False. The p-value does not depend on the sgnfcance level. It corresponds to the probablty of the sample values under H 0. Once t s obtaned, t s compared wth the sgnfcance level of nterest. False. The probablty mass of the rejecton regon s α. 2. (2.5 ponts) To determne the mpact of the ntroducton of a new tranng model for a company, an experment was carred out. It compared the results from the evaluatons of skll mprovements on two (dfferent) random samples of 16 employees each. One of the samples followed the new tranng process before the evaluatons were carred out, whle the other one followed a tradtonal tranng process. A summary of the results obtaned n the evaluatons, on a scale from 0 to 25, s: 16 x = 188, 16 x 2 = 2390, 16 y = 212, 16 y 2 = 3040. Based on the precedng data, you are asked to: (a) (1 pont) Formulate a hypothess test to determne f the evdence supports rejectng that the mprovement n the evaluatons s the same for both tranng models. Obtan the rejecton regon for a sgnfcance level of 1% and comment on your conclusons. (b) (0.5 ponts) Compute the p-value for the precedng test, and comment on your result. (c) (1 pont) Conduct the test descrbed n queston 2a) for the same sgnfcance level, but assume that you would lke to valdate the belef that the new tranng model mproves the evaluaton results by at least two ponts. Indcate the rejecton regon and your concluson for ths test. Soluton: (a) As the subjects are dfferent n both samples, we have a test for the equalty of means wth ndependent samples. We denote by X = results n the evaluatons for the tradtonal tranng process, and by Y = results n the evaluatons for the new tranng process. We also denote as µ X, µ Y the populaton means for the evaluaton results correspondng to the tradtonal and new tranng processes respectvely. The test we wsh to conduct s gven by: H 0 : µ X = µ Y H 1 : µ X = µ Y. As the sample sze for both samples s not large, we wll assume normalty and the same varance for both populatons, and our test statstc wll be T = X Y (µ X µ Y ) S P 1/nx + 1/n Y t n 2. 2

To compute the value of the statstc we wll use x = 1 x = 188/16 = 11.75, n s 2 1 X = x 2 n x 2 n 1 ȳ = 1 y = 212/16 = 13.25 n y 2 nȳ 2 = 15.4. = 12.067, s 2 Y = 1 n 1 and our estmate for the common varance, s 2 P, wll be s 2 P = (n X 1)s 2 X + (n Y 1)s 2 Y n X + n Y 2 = 15 12.067 + 15 15.4 30 = 13.733. The test statstc value s t = 11.75 13.25 0 13.733(1/16 + 1/16) = 1.145. The rejecton regon wll be gven by RR α = {(d 1,..., d n ) T > t n 2;α/2 }, where t n 2;α/2 = t 30;0.005 = 2.75. As the value of T does not le wthn the crtcal regon, we fal to reject H 0 for the ndcated sgnfcance level. We do not have suffcent evdence to conclude that the new tranng model provdes a dfferent mprovement n the evaluaton results than the tradtonal one. (b) The p-value for ths test wll be gven by p-value = 2P (T < 1.145) = 2P (T > 1.145) [0.2; 0.3]. or beng more precse, p-value = 0.261. Ths value mples we would reject the null hypothess for any sgnfcance level hgher than 0.3 (or 0.261), and we would fal to reject for any value smaller than 0.2 (or 0.261). In partcular for α = 0.01 we would fal to reject H 0. (c) The test to conduct n ths case s gven by: H 0 : µ Y µ X 2 H 1 : µ Y µ X < 2. We can use the same test statstc as n the frst queston. The rejecton regon changes to RR α = {(d 1,..., d n ) T > t n 2;α }, where t n 2;α = t 30;0.01 = 2.457. Note that T s defned n terms of X Y. The value of the test statstc under H 0 s now t = 11.75 13.25 ( 2) = 0.382. 13.733(1/16 + 1/16) Ths value les outsde of the crtcal regon, thus we fal to reject H 0. We do not have suffcent evdence to beleve that the new tranng process would mprove the evaluaton results by less than two ponts. 3. (4.5 ponts) You wsh to analyze the mpact of senorty on salares n the bankng sector. Your study relates the nfluence of the number of years spent workng n the bankng sector (varable X) and the yearly salares of employees (varable Y ), measured n thousands of euros, n a large bank. To conduct the study, a smple random sample of 12 employees has been selected, yeldng the followng (summarzed) data: 12 x = 149, 12 12 x 2 = 2611, x y = 4484, 12 y = 312, 12 12 e 2 = 174.98. y 2 = 8776, 3

(a) (0.5 ponts) Estmate the lnear regresson model Y = β 0 + β 1 X + u. (b) (0.75 ponts) Conduct a test to determne f there s a lnear relatonshp between the years workng n the bankng sector and the yearly salares, for a sgnfcance level of 5%. (c) (0.5 ponts) Compute the ANOVA table for ths model. (d) (0.5 ponts) Obtan the value of the coeffcent of determnaton and nterpret t. (e) (0.5 ponts) Compute a forecast for the salary of a new employee wth a pror experence n the bankng sector of 13 years. Obtan a confdence nterval at a 90% confdence level for ths forecast. To mprove the precedng model a new ndependent varable s taken nto consderaton, measurng the level of educaton of an employee (X 2 ). The values obtaned for the resultng multple regresson model are gven n the followng Excel output: (f) (0.75 ponts) Compute a confdence nterval at a 99% level for the coeffcent of varable X 2. Based on ths nterval, comment f ths varable would be consdered sgnfcant for the model. (g) (0.5 ponts) Complete the ANOVA table shown below, computng the values ndcated as XXXX : (h) (0.5 ponts) For each of the coeffcents of the model, ndcate f they are locally sgnfcant. Motvate your answer. Is the model globally sgnfcant? Soluton: (a) To estmate the model we use the followng values obtaned from the sample data: x = 149/12 = 12.42, ȳ = 312/12 = 26 s 2 x = (2611 12 12.42 2 )/11 = 69.17, s 2 y = (8776 12 26 2 )/11 = 60.36 s xy = (4484 12 12.42 26)/11 = 55.45. From the least-squares formulas for the two parameter estmators we obtan the estmates ˆβ 1 = s xy s 2 x = 0.802 ˆβ 0 = ȳ ˆβ 1 x = 16.05 Thus, the estmated model s (b) The test we need to conduct s ŷ = 16.05 + 0.802x. H 0 : β 1 = 0 H 1 : β 1 = 0. The test statstc s gven by T = ˆβ 1 β 1 s 2 R (n 1)s 2 x t n 2. 4

The rejecton regon corresponds to RR α = {(x, y ) t > t n 2;α/2 }, where t n 2;α/2 = t 10;0.025 = 2.228. The value of the test statstc for our sample s s 2 R = 1 n 2 e 2 = 174.98 10 = 17.50, t = 0.802 17.50 11 69.17 = 5.286. Ths value belongs to the rejecton regon. Thus, for the gven sgnfcance level, we conclude that there s a sgnfcant lnear relatonshp between the two varables. (c) We compute the followng values: SSR = n e 2 = 174.98, SST = (n 1)s 2 y = 664, SSM = SST SSR = 489.02 s 2 R = SSR/(n 2) = 17.50, F = SSM/s 2 R = 27.95. The ANOVA table wll be: Source of varablty SS df Mean F Rato Model 489.02 1 489.02 27.95 Resduals 174.98 10 17.50 Total 664 11 (d) The coeffcent of determnaton s gven by R 2 = SSM SST = 489.02 = 0.736. 664 Ths value mples that 73.6% of the varablty n the dependent varable (yearly salary) can be explaned from the values of the ndependent varable (experence n the sector), through the regresson model. (e) To forecast the salary of an employee wth an experence of x 0 = 13 years n the bankng sector, we use the lnear regresson model to obtan, ŷ 0 = 16.05 + 0.802x 0 = 26.47. The confdence nterval for ths forecast can be obtaned from the formula CI α (y 0 ) = ŷ 0 ± t n 2;α/2 s 2 R 1 + 1 n + (x 0 x) 2 (n 1)s 2. x In our case, we have ŷ 0 = 26.47, t n 2;α/2 = t 10;0.05 = 1.812, s 2 R = 17.50, x 0 x = 13 12.42 = 0.58, s 2 x = 69.17. Replacng these values n the formula we obtan CI 0.9 (y 0 ) = 26.47 ± 1.812 17.50 1 + 1 12 + 0.582 = [18.57; 34.36]. 11 69.17 (f) To compute the ndcated nterval, we use the nformaton n the Excel output and the quantle from the Student t dstrbuton, t n 3;α/2 = t 9;0.005 = 3.250. We have CI α (β 2 ) = ˆβ 2 ± t n 3;α/2 standard error = 1.763 ± 3.250 0.567 = [ 0.080; 3.606] As the nterval contans the value 0, the varable would not be sgnfcant for a sgnfcance level of 1%. 5

(g) To complete the ANOVA table we compute Total df = 9 + 2 = 11, SSR = SST SSM = 664 579.62 = 84.38 s 2 R = SSR/(n 3) = 9.38, F = SSM/s 2 R = 30.91. The resultng ANOVA table s Source of varablty df SS Mean F Rato Model 2 579.62 289.81 30.91 Resduals 9 84.38 9.38 Total 11 664 (h) From the p-values n the Excel output for the three model parameters, we may conclude that varable X 1 would not be sgnfcant n general (t would only be sgnfcant for sgnfcance levels larger than 31%), whle varable X 2 would only be sgnfcant for sgnfcance levels larger than 1.25%. The constant n the model (the β 0 coeffcent) s always sgnfcant, as ts p-value s very small (7.38 10 6 ). The model s globally sgnfcant, as the p-value for the F rato value s close to zero (9.29 10 5 ). 6