Annotated Exam of Statistics 6C - Prof. M. Romanazzi

Similar documents
Weekly price report on Pig carcass (Class S, E and R) and Piglet prices in the EU. Carcass Class S % + 0.3% % 98.

F M U Total. Total registrants at 31/12/2014. Profession AS 2, ,574 BS 15,044 7, ,498 CH 9,471 3, ,932

Variance estimation on SILC based indicators

Weighted Voting Games

A Markov system analysis application on labour market dynamics: The case of Greece

AD HOC DRAFTING GROUP ON TRANSNATIONAL ORGANISED CRIME (PC-GR-COT) STATUS OF RATIFICATIONS BY COUNCIL OF EUROPE MEMBER STATES

Assessment and Improvement of Methodologies used for GHG Projections

Trends in Human Development Index of European Union

Directorate C: National Accounts, Prices and Key Indicators Unit C.3: Statistics for administrative purposes

Modelling structural change using broken sticks

The EuCheMS Division Chemistry and the Environment EuCheMS/DCE

The trade dispute between the US and China Who wins? Who loses?

Bathing water results 2011 Slovakia

The School Geography Curriculum in European Geography Education. Similarities and differences in the United Europe.

Composition of capital NO051

Composition of capital CY007 CY007 POWSZECHNACY007 BANK OF CYPRUS PUBLIC CO LTD

Composition of capital DE025

Composition of capital ES060 ES060 POWSZECHNAES060 BANCO BILBAO VIZCAYA ARGENTARIA S.A. (BBVA)

Composition of capital LU045 LU045 POWSZECHNALU045 BANQUE ET CAISSE D'EPARGNE DE L'ETAT

Composition of capital CY006 CY006 POWSZECHNACY006 CYPRUS POPULAR BANK PUBLIC CO LTD

Composition of capital DE028 DE028 POWSZECHNADE028 DekaBank Deutsche Girozentrale, Frankfurt

Composition of capital FR015

Composition of capital FR013

Composition of capital DE017 DE017 POWSZECHNADE017 DEUTSCHE BANK AG

Composition of capital ES059

MB of. Cable. Wholesale. FWBA (fixed OAOs. connections of which Full unbundled. OAO owning. Internet. unbundled broadband

NASDAQ OMX Copenhagen A/S. 3 October Jyske Bank meets 9% Core Tier 1 ratio in EU capital exercise

WHO EpiData. A monthly summary of the epidemiological data on selected Vaccine preventable diseases in the WHO European Region

APPLYING BORDA COUNT METHOD FOR DETERMINING THE BEST WEEE MANAGEMENT IN EUROPE. Maria-Loredana POPESCU 1

Restoration efforts required for achieving the objectives of the Birds and Habitats Directives

Composition of capital as of 30 September 2011 (CRD3 rules)

Composition of capital as of 30 September 2011 (CRD3 rules)

Composition of capital as of 30 September 2011 (CRD3 rules)

Composition of capital as of 30 September 2011 (CRD3 rules)

Composition of capital as of 30 September 2011 (CRD3 rules)

Composition of capital as of 30 September 2011 (CRD3 rules)

WHO EpiData. A monthly summary of the epidemiological data on selected Vaccine preventable diseases in the European Region

WHO EpiData. A monthly summary of the epidemiological data on selected Vaccine preventable diseases in the European Region

WHO EpiData. A monthly summary of the epidemiological data on selected Vaccine preventable diseases in the WHO European Region

Part A: Salmonella prevalence estimates. (Question N EFSA-Q ) Adopted by The Task Force on 28 March 2007

The European regional Human Development and Human Poverty Indices Human Development Index

THE USE OF CSISZÁR'S DIVERGENCE TO ASSESS DISSIMILARITIES OF INCOME DISTRIBUTIONS OF EU COUNTRIES

40 Years Listening to the Beat of the Earth

Sampling scheme for LUCAS 2015 J. Gallego (JRC) A. Palmieri (DG ESTAT) H. Ramos (DG ESTAT)

Bathing water results 2011 Latvia

WHO EpiData. A monthly summary of the epidemiological data on selected vaccine preventable diseases in the European Region

Bilateral Labour Agreements, 2004

Identification of Very Shallow Groundwater Regions in the EU to Support Monitoring

STATISTICA MULTIVARIATA 2

Land Use and Land cover statistics (LUCAS)

Test of Statistics - Prof. M. Romanazzi

WHO EpiData. A monthly summary of the epidemiological data on selected Vaccine preventable diseases in the European Region

Economic and Social Council

Vocabulary: Data About Us

WHO EpiData. A monthly summary of the epidemiological data on selected Vaccine preventable diseases in the European Region

c. {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97}. Also {x x is a prime less than 100}.

APPENDIX IV Data Tables

ECOSTAT nutrient meeting ( ) Session 1: Comparison of European freshwater and saline water nutrient boundaries

United Nations Environment Programme

RISK ASSESSMENT METHODOLOGIES FOR LANDSLIDES

EuroGeoSurveys & ASGMI The Geological Surveys of Europe and IberoAmerica

Measuring Instruments Directive (MID) MID/EN14154 Short Overview

Drawing the European map

Regional economy upgrading triple helix at work? Some selected cases from the Czech republic (and Central Eastern Europe) Pavel Ptáček

Almería 23 rd -25 th October th Joint Workshop of the European Union Reference Laboratories for Residues of Pesticides

Gravity Analysis of Regional Economic Interdependence: In case of Japan

Dan Felsenthal and Moshe Machover. Analysis of QM Rule adopted by the Council of the European Union, Brussels, 23 June 2007

Use of the ISO Quality standards at the NMCAs Results from questionnaires taken in 2004 and 2011

Land Cover and Land Use Diversity Indicators in LUCAS 2009 data

Preparatory Signal Detection for the EU-27 Member States Under EU Burden Sharing Advanced Monitoring Including Uncertainty ( )

European Apple Crop Outlook 2017 a review of 2016 season and outlook Philippe Binard World Apple and Pear Association (WAPA)

PLUTO The Transport Response to the National Planning Framework. Dr. Aoife O Grady Department of Transport, Tourism and Sport

WISE International Masters

USDA Dairy Import License Circular for 2018 Commodity/

USDA Dairy Import License Circular for 2018

Publication Date: 15 Jan 2015 Effective Date: 12 Jan 2015 Addendum 6 to the CRI Technical Report (Version: 2014, Update 1)

PIRLS 2016 INTERNATIONAL RESULTS IN READING

Governments that have requested pre-export notifications pursuant to article 12, paragraph 10 (a), of the 1988 Convention

Part A: Salmonella prevalence estimates. (Question N EFSA-Q A) Adopted by The Task Force on 28 April 2008

This document is a preview generated by EVS

1. Demand for property on the coast

CropCast Europe Weekly Report

EuroGeoSurveys An Introduction

DEPENDENCE OF THE LOCATION OF THE EUROPEAN CAPITALS AND COMPETITIVENESS OF THE REGIONS

NEW MODEL OF CPI PREDICTIONS USING TIME SERIES OF GDP WITH SOLVING GDP-CPI DELAY HYPOTHESIS

United Nations Environment Programme

PIRLS 2011 The PIRLS 2011 Safe and Orderly School Scale

Modelling and projecting the postponement of childbearing in low-fertility countries

TC 288 Execution of special geotechnical works Status of standards

(Question N EFSA-Q ) Adopted by The Task Force on 20 February 2007

This document is a preview generated by EVS

Securing EUMETSAT s Mission from an Evolving Space Environment

International Survey on Private Copying WIPO Thuiskopie. Joost Poort International Conference on Private Copying Amsterdam 23 June 2016

SC4/SM4 Data Mining and Machine Learning Clustering

Structural equation modeling in evaluation of technological potential of European Union countries in the years

THE EFFECTS OF WEATHER CHANGES ON NATURAL GAS CONSUMPTION

USDA Dairy Import License Circular for 2018

This document is a preview generated by EVS

Part 2. Cost-effective Control of Acidification and Ground-level Ozone

EU JOINT TRANSFER PRICING FORUM

Transcription:

1 Università di Venezia - Corso di Laurea Economics & Management Annotated Exam of Statistics 6C - Prof. M. Romanazzi March 17th, 2015 Full Name Matricola Total (nominal) score: 30/30 (2/30 for each question). Pass score: 18/30. Lowest (18) and highest (29, 30) grades must be confirmed by oral discussion. Pocket calculator and portable computer are allowed, textbooks or class notes are not. Detailed solutions to questions must be given on the draft sheet (foglio di brutta copia); final answers/results must be copied on the exam sheet, beside the small squares.

2 Exercise 1 The stem-and-leaf display in Table 1 shows the percentage of foreign residents in northern provinces (left side, 47 provinces) and southern provinces (right side, 24 provinces) of Italy (source: Istat; data referred to 1/1/2014). 1 8 7 5 is read 7.5 % 2 2444469 3 03399 4 000227 90 5 003 41 6 973 7 58 865554433210 8 753 9 9443321 10 99666443210 11 12 543211 13 2 14 Table 1: Foreign residents (%) in Italian provinces. Left: northern provinces, right: southern provinces. Q1 How many northern provinces have a percentage of foreign residents equal to or higher than 10%? Number of provinces: 25, corresponding to a 53.2% proportion. Q2 Compute the median of the percentage of foreign residents for both northern and southern provinces. Median for northern provinces: x (24) = 10.2%, median for southern provinces: (x (12) + x (13) )/2 = 3.9% Q3 What are the differences, if any, between the two distributions? There is a striking location difference, meaning that the % of foreign residents is much higher in northern than southern provinces. Moreover, shape is positive asymmetric (with two outliers in the right tail) in southern distribution, and it appears to be more complex (possibly bimodal) in northern distribution. Dispersion (check IQR s, for example) is higher in northern distribution). Exercise 2 The time (minutes) a commuter takes to go to work is a random variable X with expectation µ = 30 and standard deviation σ = 10. Let T n be the total time to go to work in n days. Q1 Compute E(T n ) and SD(T n ) for general n. What are your assumptions? E(T n ) = µn = 30n SD(T n ) = σ n = 10 n Assumptions: the times X 1,..., X n the commuter takes to go to work each day are IID (independent and identically distributed) random variables. Q2 Set n = 100. What is the probability that T n is lower than 49 hours? From normal approximation granted by CLT, the probability turns out to be about 0.274. Q3 The commuter is late at work each day with constant probability p A = 0.1. Moreover, late arrivals in different days are stochastically independent. How many times do you expect the commuter to be late in 30 randomly selected days? What is the corresponding standard deviation? No. of expected late arrivals: 3 Standard deviation: about 1.643

3 Exercise 3 A random sample of n = 64 students were asked Who wrote the italian novel Il Gattopardo? Let p A denote the relative frequency of students that knew the answer in the population. Q1 Suppose that 39 students gave the right answer. What is the sample estimate of p A? What is the confidence interval for p A? (Confidence level: 0.95) Sample estimate: 0.609 0.061 CI: (0.489, 0.729) Q2 What should be the sample size n so as the standard error of the estimate of p A turns out to be lower than 0.02? n > 625 (assuming the worst possible configuration of population) Q3 Suppose two independent samples of n 1 = 64 and n 2 = 150 students were asked the question and suppose that the sample proportions of the right answers turned out to be ˆp A,n1 = ˆp A,n2 = 0.6. Consider the null hypothesis H 0 : p A = 0.5 against the alternative H 1 : p A 0.5, with significance level α = 0.05. What is the decision about H 0 in the two cases? Explain carefully. Same decision, reject H 0 FALSE Same decision, do not reject H 0 FALSE Sample 1: reject H 0, sample 2: do not reject H 0 FALSE Sample 1: do not reject H 0, sample 2: reject H 0 TRUE Explanation: In both cases, the test statistic is the standardized proportion with population proportion equal to 1/2 and the non rejection region is the interval 1.96 < Z < 1.96, Z N(0, 1). The observed values of test statistic are 1.6 for sample 1 and 2.45 for sample 2, leading to non rejection and rejection respectively. The apparently paradoxical result is due to the difference in sample size: a higher sample size implies a less tolerant treatment of discrepancies between observed proportion and theoretical proportion under the null hypothesis. Exercise 4 To evaluate the effect of a training program, a test was given to a random sample of attendants before and after the training period. Let X and Y denote the test scores before and after the training period and let Z = Y X. Q1 On a random sample of n = 22 attendants we obtained i=1 z i = 123.7 and 1466.23. Compute the sample mean and the sample standard deviation of the data. Sample mean: 5.623 Sample standard deviation: 6.058 i=1 z2 i = Q2 Let µ Z denote the expectation of Z in the reference population. We want to test the null hypothesis H 0 : µ Z 0 against the alternative H 1 : µ Z > 0. What is the rejection region for H 0, if the significance level of the test is α = 0.05? What is the observed value of the test statistic? Rejection region: values of test statistic higher than t 21,0.95 = 1.721 Observed value of test statistic: 4.354 Q3 According to previous results, did the training improve, in the average, the expertise of attendants? Explain briefly. No, it did not FALSE Yes, it did because, according to previous results, the null hypothesis is rejected implying E(Z) > 0, i. e., E(Y X) > 0, that is, E(Y ) > E(X). The result is doubtful FALSE Exercise 5 The scatter plot in Figure 1 shows the joint distribution of life expectancy at birth 1 for male (X) 1 Life expectancy at birth is the expected number of years a man or a woman will live.

4 n i=1 x i i=1 y i i=1 x2 i i=1 y2 i i=1 x iy i 28 2130.9 2302.1 162528.0 189396.8 175386.9 x y s X s Y s X,Y r X,Y 76.1 82.2 3.65 2.14 6.995 0.899 Table 2: Summary statistics. EU Countries 2013 EU Countries 2013 FR SP IT FR SP IT Female life expectancy (years) 78 80 82 84 LI LA RO BU ES UN SK PO CR CZ FI LU PR AU SW SL GR GE CY BE IR MA NE UK DE Female life expectancy (years) 78 80 82 84 LI LA ES RO BU UN PO SK CR CZ * FI LU PR AU SW SL GR GE CY BE IR MA NE UK DE 68 70 72 74 76 78 80 Male life expectancy (years) 68 70 72 74 76 78 80 Male life expectancy (years) Figure 1: Left: exam scatter plot. Right: solution scatter plot; *: centroid, red labels are countries with more recent EU membership. and females (Y ) in the European Union countries 2. Table 2 gives the summary statistics of the data. Q1 Compute sample means, standard deviations, covariance and correlation and report the results in Table 2. See bottom line in Table 2. Q2 Estimate a linear prediction model y = a + bx for Y, using X as explanatory variable. What are the estimated coefficients and the goodness-of-fit of the model? Intercept a 42.17, slope b 0.5262 Goodness-of-fit: R 2 0.8075 Q3 Mark the position of the centroid of the distribution on the plot. How do you evaluate Italian situation in the context of European Union? The scatter plot suggests dependence or independence of the variables? Italian situation appears very good because life expectancy is very high both for males (second highest value after Sweden) and females (third highest value after Spain and France). 2 Austria: AU, Belgium: BE, Bulgaria: BU, Cyprus: CY, Croatia: CR, Denmmark: DE, Estonia: ES, Finland: FI, France: FR, Germany: GE, Greece: GR, Ireland: IR, Italy: IT, Latvia: LA, Lithuania: LI; Luxembourg: LU; Malta: MA; Netherlands: NE; Poland: PO; Portugal: PR; United Kingdom: UK; Czech Republic: CZ; Romania: RO; Slovakia: SK; Slovenia: SL; Spain: SP; Sweden: SW; Ungary: UN.

The scatter plot suggests strong linear dependence, as confirmed by the value of r X,Y 0.9. The scatter plot also suggests EU countries to belong to two different groups, the countries with more recent EU membership (values of X and Y both below the average values) and the remaining countries (values of X and Y both above the average values). 5