1 Università di Venezia - Corso di Laurea Economics & Management Annotated Exam of Statistics 6C - Prof. M. Romanazzi March 17th, 2015 Full Name Matricola Total (nominal) score: 30/30 (2/30 for each question). Pass score: 18/30. Lowest (18) and highest (29, 30) grades must be confirmed by oral discussion. Pocket calculator and portable computer are allowed, textbooks or class notes are not. Detailed solutions to questions must be given on the draft sheet (foglio di brutta copia); final answers/results must be copied on the exam sheet, beside the small squares.
2 Exercise 1 The stem-and-leaf display in Table 1 shows the percentage of foreign residents in northern provinces (left side, 47 provinces) and southern provinces (right side, 24 provinces) of Italy (source: Istat; data referred to 1/1/2014). 1 8 7 5 is read 7.5 % 2 2444469 3 03399 4 000227 90 5 003 41 6 973 7 58 865554433210 8 753 9 9443321 10 99666443210 11 12 543211 13 2 14 Table 1: Foreign residents (%) in Italian provinces. Left: northern provinces, right: southern provinces. Q1 How many northern provinces have a percentage of foreign residents equal to or higher than 10%? Number of provinces: 25, corresponding to a 53.2% proportion. Q2 Compute the median of the percentage of foreign residents for both northern and southern provinces. Median for northern provinces: x (24) = 10.2%, median for southern provinces: (x (12) + x (13) )/2 = 3.9% Q3 What are the differences, if any, between the two distributions? There is a striking location difference, meaning that the % of foreign residents is much higher in northern than southern provinces. Moreover, shape is positive asymmetric (with two outliers in the right tail) in southern distribution, and it appears to be more complex (possibly bimodal) in northern distribution. Dispersion (check IQR s, for example) is higher in northern distribution). Exercise 2 The time (minutes) a commuter takes to go to work is a random variable X with expectation µ = 30 and standard deviation σ = 10. Let T n be the total time to go to work in n days. Q1 Compute E(T n ) and SD(T n ) for general n. What are your assumptions? E(T n ) = µn = 30n SD(T n ) = σ n = 10 n Assumptions: the times X 1,..., X n the commuter takes to go to work each day are IID (independent and identically distributed) random variables. Q2 Set n = 100. What is the probability that T n is lower than 49 hours? From normal approximation granted by CLT, the probability turns out to be about 0.274. Q3 The commuter is late at work each day with constant probability p A = 0.1. Moreover, late arrivals in different days are stochastically independent. How many times do you expect the commuter to be late in 30 randomly selected days? What is the corresponding standard deviation? No. of expected late arrivals: 3 Standard deviation: about 1.643
3 Exercise 3 A random sample of n = 64 students were asked Who wrote the italian novel Il Gattopardo? Let p A denote the relative frequency of students that knew the answer in the population. Q1 Suppose that 39 students gave the right answer. What is the sample estimate of p A? What is the confidence interval for p A? (Confidence level: 0.95) Sample estimate: 0.609 0.061 CI: (0.489, 0.729) Q2 What should be the sample size n so as the standard error of the estimate of p A turns out to be lower than 0.02? n > 625 (assuming the worst possible configuration of population) Q3 Suppose two independent samples of n 1 = 64 and n 2 = 150 students were asked the question and suppose that the sample proportions of the right answers turned out to be ˆp A,n1 = ˆp A,n2 = 0.6. Consider the null hypothesis H 0 : p A = 0.5 against the alternative H 1 : p A 0.5, with significance level α = 0.05. What is the decision about H 0 in the two cases? Explain carefully. Same decision, reject H 0 FALSE Same decision, do not reject H 0 FALSE Sample 1: reject H 0, sample 2: do not reject H 0 FALSE Sample 1: do not reject H 0, sample 2: reject H 0 TRUE Explanation: In both cases, the test statistic is the standardized proportion with population proportion equal to 1/2 and the non rejection region is the interval 1.96 < Z < 1.96, Z N(0, 1). The observed values of test statistic are 1.6 for sample 1 and 2.45 for sample 2, leading to non rejection and rejection respectively. The apparently paradoxical result is due to the difference in sample size: a higher sample size implies a less tolerant treatment of discrepancies between observed proportion and theoretical proportion under the null hypothesis. Exercise 4 To evaluate the effect of a training program, a test was given to a random sample of attendants before and after the training period. Let X and Y denote the test scores before and after the training period and let Z = Y X. Q1 On a random sample of n = 22 attendants we obtained i=1 z i = 123.7 and 1466.23. Compute the sample mean and the sample standard deviation of the data. Sample mean: 5.623 Sample standard deviation: 6.058 i=1 z2 i = Q2 Let µ Z denote the expectation of Z in the reference population. We want to test the null hypothesis H 0 : µ Z 0 against the alternative H 1 : µ Z > 0. What is the rejection region for H 0, if the significance level of the test is α = 0.05? What is the observed value of the test statistic? Rejection region: values of test statistic higher than t 21,0.95 = 1.721 Observed value of test statistic: 4.354 Q3 According to previous results, did the training improve, in the average, the expertise of attendants? Explain briefly. No, it did not FALSE Yes, it did because, according to previous results, the null hypothesis is rejected implying E(Z) > 0, i. e., E(Y X) > 0, that is, E(Y ) > E(X). The result is doubtful FALSE Exercise 5 The scatter plot in Figure 1 shows the joint distribution of life expectancy at birth 1 for male (X) 1 Life expectancy at birth is the expected number of years a man or a woman will live.
4 n i=1 x i i=1 y i i=1 x2 i i=1 y2 i i=1 x iy i 28 2130.9 2302.1 162528.0 189396.8 175386.9 x y s X s Y s X,Y r X,Y 76.1 82.2 3.65 2.14 6.995 0.899 Table 2: Summary statistics. EU Countries 2013 EU Countries 2013 FR SP IT FR SP IT Female life expectancy (years) 78 80 82 84 LI LA RO BU ES UN SK PO CR CZ FI LU PR AU SW SL GR GE CY BE IR MA NE UK DE Female life expectancy (years) 78 80 82 84 LI LA ES RO BU UN PO SK CR CZ * FI LU PR AU SW SL GR GE CY BE IR MA NE UK DE 68 70 72 74 76 78 80 Male life expectancy (years) 68 70 72 74 76 78 80 Male life expectancy (years) Figure 1: Left: exam scatter plot. Right: solution scatter plot; *: centroid, red labels are countries with more recent EU membership. and females (Y ) in the European Union countries 2. Table 2 gives the summary statistics of the data. Q1 Compute sample means, standard deviations, covariance and correlation and report the results in Table 2. See bottom line in Table 2. Q2 Estimate a linear prediction model y = a + bx for Y, using X as explanatory variable. What are the estimated coefficients and the goodness-of-fit of the model? Intercept a 42.17, slope b 0.5262 Goodness-of-fit: R 2 0.8075 Q3 Mark the position of the centroid of the distribution on the plot. How do you evaluate Italian situation in the context of European Union? The scatter plot suggests dependence or independence of the variables? Italian situation appears very good because life expectancy is very high both for males (second highest value after Sweden) and females (third highest value after Spain and France). 2 Austria: AU, Belgium: BE, Bulgaria: BU, Cyprus: CY, Croatia: CR, Denmmark: DE, Estonia: ES, Finland: FI, France: FR, Germany: GE, Greece: GR, Ireland: IR, Italy: IT, Latvia: LA, Lithuania: LI; Luxembourg: LU; Malta: MA; Netherlands: NE; Poland: PO; Portugal: PR; United Kingdom: UK; Czech Republic: CZ; Romania: RO; Slovakia: SK; Slovenia: SL; Spain: SP; Sweden: SW; Ungary: UN.
The scatter plot suggests strong linear dependence, as confirmed by the value of r X,Y 0.9. The scatter plot also suggests EU countries to belong to two different groups, the countries with more recent EU membership (values of X and Y both below the average values) and the remaining countries (values of X and Y both above the average values). 5