Statistics II Final Exam 26/6/18

Similar documents
Chapter 15 - Multiple Regression

Economics 130. Lecture 4 Simple Linear Regression Continued

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Statistics for Economics & Business

Chapter 11: Simple Linear Regression and Correlation

STATISTICS QUESTIONS. Step by Step Solutions.

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Chapter 13: Multiple Regression

Chapter 14 Simple Linear Regression

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Statistics for Business and Economics

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Basic Business Statistics, 10/e

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

x = , so that calculated

Lecture 6: Introduction to Linear Regression

x i1 =1 for all i (the constant ).

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

First Year Examination Department of Statistics, University of Florida

18. SIMPLE LINEAR REGRESSION III

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Comparison of Regression Lines

/ n ) are compared. The logic is: if the two

Lecture 4 Hypothesis Testing

STAT 3008 Applied Regression Analysis

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Regression. The Simple Linear Regression Model

28. SIMPLE LINEAR REGRESSION III

Learning Objectives for Chapter 11

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

January Examinations 2015

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Statistics Chapter 4

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Chapter 15 Student Lecture Notes 15-1

F statistic = s2 1 s 2 ( F for Fisher )

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

1-FACTOR ANOVA (MOTIVATION) [DEVORE 10.1]

Introduction to Regression

Polynomial Regression Models

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

experimenteel en correlationeel onderzoek

Biostatistics 360 F&t Tests and Intervals in Regression 1

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

Lecture 6 More on Complete Randomized Block Design (RBD)

Statistics MINITAB - Lab 2

Linear Regression Analysis: Terminology and Notation

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

CHAPTER 6 GOODNESS OF FIT AND CONTINGENCY TABLE PREPARED BY: DR SITI ZANARIAH SATARI & FARAHANIM MISNI

Scatter Plot x

Topic- 11 The Analysis of Variance

Joint Statistical Meetings - Biopharmaceutical Section

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

a. (All your answers should be in the letter!

Chapter 12 Analysis of Covariance

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Methods of Detecting Outliers in A Regression Analysis Model.

Midterm Examination. Regression and Forecasting Models

SIMPLE LINEAR REGRESSION

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov,

ANOVA. The Observations y ij

STAT 511 FINAL EXAM NAME Spring 2001

Correlation and Regression

Chapter 8 Indicator Variables

REGRESSION ANALYSIS II- MULTICOLLINEARITY

Chemometrics. Unit 2: Regression Analysis

Negative Binomial Regression

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Methods in Epidemiology. Medical statistics 02/11/2014. Estimation How large is the effect? At the end of the lecture students should be able

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

17 Nested and Higher Order Designs

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Diagnostics in Poisson Regression. Models - Residual Analysis

17 - LINEAR REGRESSION II

Chapter 11: I = 2 samples independent samples paired samples Chapter 12: I 3 samples of equal size J one-way layout two-way layout

# c i. INFERENCE FOR CONTRASTS (Chapter 4) It's unbiased: Recall: A contrast is a linear combination of effects with coefficients summing to zero:

III. Econometric Methodology Regression Analysis

Regression Analysis. Regression Analysis

Professor Chris Murray. Midterm Exam

CHAPTER 8. Exercise Solutions

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Transcription:

Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the town. In partcular, t s nterested n testng f the yearly amount of unused food does not exceed a thousand klos per restaurant. The town hall has collected nformaton from a smple random sample of 10 restaurants, obtanng an average of 1.15 thousand klos of leftover food per year, and a quas-standard devaton of 150 klos. Assume that the amount of unused food follows a normal dstrbuton. (a) (1 pont) Help the town hall to test the precedng statement on the leftover food at restaurants, at a sgnfcance level of 5%. Buld the rejecton regon for ths test and comment on your concluson. (b) (0.5 ponts) Compute the p-value for the test and comment on your result. (c) (1 pont) Obtan the power of the test when the true value of the average yearly unused food s 1.1 thousands of klos. (d) (0.5 ponts) Comment f the followng statements are true or false. Justfy your answers only for the false statements: Soluton: For a (true) value of the unused food, and keepng everythng else constant, f the number of restaurants n the sample ncreases then the power s mproved. If the test s conducted at a sgnfcance level of 1%, nstead of 5%, the p-value obtaned for the test changes. The probablty mass n the crtcal (or rejecton) regon s 1 α. (a) Let X = yearly unused food n a restaurant, and µ = E[X] ts average. The test we wsh to conduct s H 0 : µ 1 H 1 : µ > 1. Assume that X follows a normal dstrbuton and we have a s.r.s. The test statstc s: T = X µ S/ n t n 1, where S denotes the sample quas-standard devaton and n = 10. The rejecton regon wll be gven by RR α = {(x 1,..., x n ) : t > t n 1;α }. In ths case α = 0.05 and t n 1;α = t 9;0.05 = 1.833. The value of the test statstc for ths sample wll be t = 1.15 1 0.15/ 10 = 3.162. Ths value les wthn the rejecton regon. Thus, we conclude that we have suffcent evdence to reject the null hypothess, and to support the statement that the average amount of unused food per year and per restaurant s larger than a thousand klos. (b) The p-value for ths test wll be gven by p-value = P (T > t) = P (t 9 > 3.162) [0.005; 0.01] (a more precse value s 0.0058). Ths p-value s very low (smaller than 1%), provdng a compellng ndcaton to reject H 0. 1

(c) To obtan the power of the test, power(µ), for µ = 1.1, we work wth the defnton of the power as the probablty of rejectng H 0 when t s not true, power(1.1) = P (reject H 0 µ = 1.1) = P (T > 1.833 µ = 1.1) X µ0 = P S/ > 1.833 µ = 1.1 = P X > 1.087 µ = 1.1 n = P t 9 > 1.087 µ 0.15/ 10 µ = 1.1 1.087 1.1 = P t 9 > 0.0474 = P (t 9 > 0.274) [0.5; 0.75], or wth greater precson the power s 0.605. In the precedng argument we have used that, assumng the correct value for µ s 1.1, t holds (d) The correct answers are: X 1.1 S/ n t 9. True. The power ncreases wth the sample sze. False. The p-value does not depend on the sgnfcance level. It corresponds to the probablty of the sample values under H 0. Once t s obtaned, t s compared wth the sgnfcance level of nterest. False. The probablty mass of the rejecton regon s α. 2. (2.5 ponts) To determne the mpact of the ntroducton of a new tranng model for a company, an experment was carred out. It compared the results from the evaluatons of skll mprovements on two (dfferent) random samples of 16 employees each. One of the samples followed the new tranng process before the evaluatons were carred out, whle the other one followed a tradtonal tranng process. A summary of the results obtaned n the evaluatons, on a scale from 0 to 25, s: 16 x = 188, 16 x 2 = 2390, 16 y = 212, 16 y 2 = 3040. Based on the precedng data, you are asked to: (a) (1 pont) Formulate a hypothess test to determne f the evdence supports rejectng that the mprovement n the evaluatons s the same for both tranng models. Obtan the rejecton regon for a sgnfcance level of 1% and comment on your conclusons. (b) (0.5 ponts) Compute the p-value for the precedng test, and comment on your result. (c) (1 pont) Conduct the test descrbed n queston 2a) for the same sgnfcance level, but assume that you would lke to valdate the belef that the new tranng model mproves the evaluaton results by at least two ponts. Indcate the rejecton regon and your concluson for ths test. Soluton: (a) As the subjects are dfferent n both samples, we have a test for the equalty of means wth ndependent samples. We denote by X = results n the evaluatons for the tradtonal tranng process, and by Y = results n the evaluatons for the new tranng process. We also denote as µ X, µ Y the populaton means for the evaluaton results correspondng to the tradtonal and new tranng processes respectvely. The test we wsh to conduct s gven by: H 0 : µ X = µ Y H 1 : µ X = µ Y. As the sample sze for both samples s not large, we wll assume normalty and the same varance for both populatons, and our test statstc wll be T = X Y (µ X µ Y ) S P 1/nx + 1/n Y t n 2. 2

To compute the value of the statstc we wll use x = 1 x = 188/16 = 11.75, n s 2 1 X = x 2 n x 2 n 1 ȳ = 1 y = 212/16 = 13.25 n y 2 nȳ 2 = 15.4. = 12.067, s 2 Y = 1 n 1 and our estmate for the common varance, s 2 P, wll be s 2 P = (n X 1)s 2 X + (n Y 1)s 2 Y n X + n Y 2 = 15 12.067 + 15 15.4 30 = 13.733. The test statstc value s t = 11.75 13.25 0 13.733(1/16 + 1/16) = 1.145. The rejecton regon wll be gven by RR α = {(d 1,..., d n ) T > t n 2;α/2 }, where t n 2;α/2 = t 30;0.005 = 2.75. As the value of T does not le wthn the crtcal regon, we fal to reject H 0 for the ndcated sgnfcance level. We do not have suffcent evdence to conclude that the new tranng model provdes a dfferent mprovement n the evaluaton results than the tradtonal one. (b) The p-value for ths test wll be gven by p-value = 2P (T < 1.145) = 2P (T > 1.145) [0.2; 0.3]. or beng more precse, p-value = 0.261. Ths value mples we would reject the null hypothess for any sgnfcance level hgher than 0.3 (or 0.261), and we would fal to reject for any value smaller than 0.2 (or 0.261). In partcular for α = 0.01 we would fal to reject H 0. (c) The test to conduct n ths case s gven by: H 0 : µ Y µ X 2 H 1 : µ Y µ X < 2. We can use the same test statstc as n the frst queston. The rejecton regon changes to RR α = {(d 1,..., d n ) T > t n 2;α }, where t n 2;α = t 30;0.01 = 2.457. Note that T s defned n terms of X Y. The value of the test statstc under H 0 s now t = 11.75 13.25 ( 2) = 0.382. 13.733(1/16 + 1/16) Ths value les outsde of the crtcal regon, thus we fal to reject H 0. We do not have suffcent evdence to beleve that the new tranng process would mprove the evaluaton results by less than two ponts. 3. (4.5 ponts) You wsh to analyze the mpact of senorty on salares n the bankng sector. Your study relates the nfluence of the number of years spent workng n the bankng sector (varable X) and the yearly salares of employees (varable Y ), measured n thousands of euros, n a large bank. To conduct the study, a smple random sample of 12 employees has been selected, yeldng the followng (summarzed) data: 12 x = 149, 12 12 x 2 = 2611, x y = 4484, 12 y = 312, 12 12 e 2 = 174.98. y 2 = 8776, 3

(a) (0.5 ponts) Estmate the lnear regresson model Y = β 0 + β 1 X + u. (b) (0.75 ponts) Conduct a test to determne f there s a lnear relatonshp between the years workng n the bankng sector and the yearly salares, for a sgnfcance level of 5%. (c) (0.5 ponts) Compute the ANOVA table for ths model. (d) (0.5 ponts) Obtan the value of the coeffcent of determnaton and nterpret t. (e) (0.5 ponts) Compute a forecast for the salary of a new employee wth a pror experence n the bankng sector of 13 years. Obtan a confdence nterval at a 90% confdence level for ths forecast. To mprove the precedng model a new ndependent varable s taken nto consderaton, measurng the level of educaton of an employee (X 2 ). The values obtaned for the resultng multple regresson model are gven n the followng Excel output: (f) (0.75 ponts) Compute a confdence nterval at a 99% level for the coeffcent of varable X 2. Based on ths nterval, comment f ths varable would be consdered sgnfcant for the model. (g) (0.5 ponts) Complete the ANOVA table shown below, computng the values ndcated as XXXX : (h) (0.5 ponts) For each of the coeffcents of the model, ndcate f they are locally sgnfcant. Motvate your answer. Is the model globally sgnfcant? Soluton: (a) To estmate the model we use the followng values obtaned from the sample data: x = 149/12 = 12.42, ȳ = 312/12 = 26 s 2 x = (2611 12 12.42 2 )/11 = 69.17, s 2 y = (8776 12 26 2 )/11 = 60.36 s xy = (4484 12 12.42 26)/11 = 55.45. From the least-squares formulas for the two parameter estmators we obtan the estmates ˆβ 1 = s xy s 2 x = 0.802 ˆβ 0 = ȳ ˆβ 1 x = 16.05 Thus, the estmated model s (b) The test we need to conduct s ŷ = 16.05 + 0.802x. H 0 : β 1 = 0 H 1 : β 1 = 0. The test statstc s gven by T = ˆβ 1 β 1 s 2 R (n 1)s 2 x t n 2. 4

The rejecton regon corresponds to RR α = {(x, y ) t > t n 2;α/2 }, where t n 2;α/2 = t 10;0.025 = 2.228. The value of the test statstc for our sample s s 2 R = 1 n 2 e 2 = 174.98 10 = 17.50, t = 0.802 17.50 11 69.17 = 5.286. Ths value belongs to the rejecton regon. Thus, for the gven sgnfcance level, we conclude that there s a sgnfcant lnear relatonshp between the two varables. (c) We compute the followng values: SSR = n e 2 = 174.98, SST = (n 1)s 2 y = 664, SSM = SST SSR = 489.02 s 2 R = SSR/(n 2) = 17.50, F = SSM/s 2 R = 27.95. The ANOVA table wll be: Source of varablty SS df Mean F Rato Model 489.02 1 489.02 27.95 Resduals 174.98 10 17.50 Total 664 11 (d) The coeffcent of determnaton s gven by R 2 = SSM SST = 489.02 = 0.736. 664 Ths value mples that 73.6% of the varablty n the dependent varable (yearly salary) can be explaned from the values of the ndependent varable (experence n the sector), through the regresson model. (e) To forecast the salary of an employee wth an experence of x 0 = 13 years n the bankng sector, we use the lnear regresson model to obtan, ŷ 0 = 16.05 + 0.802x 0 = 26.47. The confdence nterval for ths forecast can be obtaned from the formula CI α (y 0 ) = ŷ 0 ± t n 2;α/2 s 2 R 1 + 1 n + (x 0 x) 2 (n 1)s 2. x In our case, we have ŷ 0 = 26.47, t n 2;α/2 = t 10;0.05 = 1.812, s 2 R = 17.50, x 0 x = 13 12.42 = 0.58, s 2 x = 69.17. Replacng these values n the formula we obtan CI 0.9 (y 0 ) = 26.47 ± 1.812 17.50 1 + 1 12 + 0.582 = [18.57; 34.36]. 11 69.17 (f) To compute the ndcated nterval, we use the nformaton n the Excel output and the quantle from the Student t dstrbuton, t n 3;α/2 = t 9;0.005 = 3.250. We have CI α (β 2 ) = ˆβ 2 ± t n 3;α/2 standard error = 1.763 ± 3.250 0.567 = [ 0.080; 3.606] As the nterval contans the value 0, the varable would not be sgnfcant for a sgnfcance level of 1%. 5

(g) To complete the ANOVA table we compute Total df = 9 + 2 = 11, SSR = SST SSM = 664 579.62 = 84.38 s 2 R = SSR/(n 3) = 9.38, F = SSM/s 2 R = 30.91. The resultng ANOVA table s Source of varablty df SS Mean F Rato Model 2 579.62 289.81 30.91 Resduals 9 84.38 9.38 Total 11 664 (h) From the p-values n the Excel output for the three model parameters, we may conclude that varable X 1 would not be sgnfcant n general (t would only be sgnfcant for sgnfcance levels larger than 31%), whle varable X 2 would only be sgnfcant for sgnfcance levels larger than 1.25%. The constant n the model (the β 0 coeffcent) s always sgnfcant, as ts p-value s very small (7.38 10 6 ). The model s globally sgnfcant, as the p-value for the F rato value s close to zero (9.29 10 5 ). 6