Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Similar documents
Inference for Regression

Ch 3: Multiple Linear Regression

Ch 2: Simple Linear Regression

Math 3330: Solution to midterm Exam

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Applied Regression Analysis

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

Lecture 6 Multiple Linear Regression, cont.

ST430 Exam 2 Solutions

Simple and Multiple Linear Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression

Coefficient of Determination

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

ST430 Exam 1 with Answers

STAT 360-Linear Models

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Handout 4: Simple Linear Regression

Variance Decomposition and Goodness of Fit

Regression Analysis lab 3. 1 Multiple linear regression. 1.1 Import data. 1.2 Scatterplot matrix

STAT 215 Confidence and Prediction Intervals in Regression

Lecture 18: Simple Linear Regression

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Chapter 14 Simple Linear Regression (A)

Multiple Linear Regression

Simple Linear Regression

MATH 644: Regression Analysis Methods

Formal Statement of Simple Linear Regression Model

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Linear models and their mathematical foundations: Simple linear regression

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Homework 9 Sample Solution

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Linear Regression Model. Badr Missaoui

Tests of Linear Restrictions

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

MODELS WITHOUT AN INTERCEPT

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Homework 2: Simple Linear Regression

BNAD 276 Lecture 10 Simple Linear Regression Model

Chapter 12 - Lecture 2 Inferences about regression coefficient

MS&E 226: Small Data

The Multiple Regression Model

Density Temp vs Ratio. temp

Chapter 12: Linear regression II

Measuring the fit of the model - SSR

Simple Linear Regression

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

R 2 and F -Tests and ANOVA

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Lecture 3: Inference in SLR

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Simple linear regression

2.1: Inferences about β 1

6. Multiple Linear Regression

L21: Chapter 12: Linear regression

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Simple Linear Regression

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Correlation Analysis

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

ST505/S697R: Fall Homework 2 Solution.

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Lecture 1 Linear Regression with One Predictor Variable.p2

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Stat 401B Final Exam Fall 2015

Lecture 10 Multiple Linear Regression

13 Simple Linear Regression

The Classical Linear Regression Model

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Simple Linear Regression: One Quantitative IV

Lectures on Simple Linear Regression Stat 431, Summer 2012

Simple Linear Regression

ANOVA (Analysis of Variance) output RLS 11/20/2016

Applied Regression Analysis. Section 2: Multiple Linear Regression

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Chapter 2 Inferences in Simple Linear Regression

Lecture 4 Multiple linear regression

Multiple Linear Regression

Basic Business Statistics, 10/e

STAT 3022 Spring 2007

Table 1: Fish Biomass data set on 26 streams

Inference in Normal Regression Model. Dr. Frank Wood

Dealing with Heteroskedasticity

Chapter 11: Linear Regression and Correla4on. Correla4on

SCHOOL OF MATHEMATICS AND STATISTICS

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

INTERVAL ESTIMATION AND HYPOTHESES TESTING

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

36-707: Regression Analysis Homework Solutions. Homework 3

Comparing Nested Models

AMS-207: Bayesian Statistics

STAT 350: Summer Semester Midterm 1: Solutions

Multiple Regression: Example

Lecture 15. Hypothesis testing in the linear model

Simple linear regression

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Transcription:

0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem# 1. The shipment route (X) and the number of ampules to be broken upon arrival (Y). The summary of simple linear regression fit is following: Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -2.2-1.2 0.3 0.8 1.8 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 10.2000 0.6633 15.377 3.18e-07 *** x 4.0000 0.4690 8.528 2.75e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 1.483 on 8 degrees of freedom Multiple R-Squared: 0.9009, Adjusted R-squared: 0.8885 F-statistic: 72.73 on 1 and 8 DF, p-value: 2.749e-05 (a) The estimated function is Ŷ = 10.2 + 4 x. Linear regression function apprears to give a good fit here because the data are around the linear regression function and the pvalue of testing β 1 = 0 is 2.75e-05 supporting that linear fit is statistically 1

significant, R 2 = 0.9, and pearson correlation is 0.949158. However, the number of unique values of X are only four. If we have more various values of X, the fitted line would be more informative. (b) A point estimate of the expected number of broken ampules when X = 1 transfer is made is Ŷ = 10.2 + 4 1 = 14.2 (c) The expected number of ampules broken when there are 2 transfer is Ŷ = 10.2 + 4 2 = 18.2. Therefore, the estimation of the increase is Ŷ (2) Ŷ (1) = 4(2 1) = 4. (d) Since x = 1 and ȳ = 14.2, ȳ = 10.2 + 4 x. Problem# 2. Refer to Airfreight breakage (Problem 1). (a) Estimate β 1 with a 95% confidence interval. Interpret your interval estimate. A.(a) Summary of simple linear regression: > Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -2.2-1.2 0.3 0.8 1.8 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 10.2000 0.6633 15.377 3.18e-07 *** x 4.0000 0.4690 8.528 2.75e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 1.483 on 8 degrees of freedom Multiple R-Squared: 0.9009, Adjusted R-squared: 0.8885 F-statistic: 72.73 on 1 and 8 DF, p-value: 2.749e-05 > confint(lmfit) 2.5 % 97.5 % (Intercept) 8.670370 11.729630 x 2.918388 5.081612 With confidence coefficient.95, we estimate that the mean number of broken ampules increases by somewhere between 2.92 and 5.08 for each additional unit time that a carton is transfered. NOTE: In 95% confidence interval, 95% is NOT a probability that interval includes means of number of broken ampules (b) Conduct a t-test to decide whether or not there is a linear association between number of times a carton is transferred (X) and number of broken ampules (Y). Use a level of significance of 0.05. State the alternatives, decision rule, and conclusion. What is the P-value of the test? A.(b) 2

Hypothesis: H 0 : β 1 = 0 vs H a : β 1 0 Test statistic: t = 4 0 = 8.528 0.4690 Decision rule: reject H 0 if t > t(1 0.05/2, 10 2) = 2.069. So, we reject H 0 Conclusion: There is an statistical evidence that there is significant linear relationship between x and y with a level of significance=0.05. P-value: P r( t > t ) = 2 P r(t > t ) = 2.75e 05 which is less than 0.05 (=a level of significance). Therefore, we have the same conclusion which we made based on decison rule using test statistic. That is, there is an evidence that a linear relationship between number of times and number of broken ampules is statistically significant with α = 0.05 (c) Conduct a t-test to decide whether or not there is a POSITIVE linear association between number of times a carton is transferred (X) and number of broken ampules (Y). Use a level of significance of 0.05. State the alternatives, decision rule, and conclusion. What is the P-value of the test? A.(c) Hypothesis: H 0 : β 1 = 0 vs H a : β 1 > 0 Test statistic: t = 4 0 = 8.528 0.4690 Decision rule: Reject H 0 if t > t(1 0.05, 10 2) = 1.86. We reject H 0 P-value: P r(t > t ) = 2.75e 05/2 which is less than 0.05 (=a level of significance). Therefore, we have the same conclusion which we made based on decison rule. Conclusion: there is an evidence that a positive linear relationship between number of times and number of broken ampules is statistically significant with α = 0.05 (d) β 0 represents here the mean number of ampules broken when no transfers of the shipment are made-i.e., when X = 0. Obtain a 95% confidence interval for β 0 and interpret it. A.(d) With confidence coefficient.95, we estimate that the mean number of broken ampules when no transfers of the shipment are made are in somewhere between 8.67 and 11.73. (e) A consultant has suggested, on the basis of previous experience, that the mean number of broken ampules should not exceed 9.0 when no transfers are made. Conduct an appropriate test, using α = 0.025. State the alternatives, decision rule, and conclusion. What is the P -value of the test? A.(e) Hypothesis: H 0 : β 0 = 9 vs H a : β 0 > 9 Test statistic t = 14.2 9 = 1.809 0.6633 Decision rule: reject H 0 if t > t(1 0.0025, 8) = 2.306 Conclusion: There is no statistical evidence that the mean number of broken ampules exceed 9.0 when no transfers are made. P r(t > t ) = 1 P r(t < t ) = 1 P r(t 8 < 1.809) = 0.054 which is not less than α = 0.025. Hence we have the same conclusion based on decision rule. 3

(f) Obtain the power of your test in part (b) if actually β 1 = 2.0. Assume σ{b 1 } =.50. A.1(f) (Case1) If we know the σ{b 1 }, we use normal distribution, P ower = P r(accepth a H a istrue) = P r(accepth a β 1 = 2) = P r( ˆβ 1 0 > 1.96 β 1 = 2) = P r( ˆβ 1 2 > 1.96 2/) + P r( ˆβ 1 2 < 1.96 2/) = 0.9793 (Case2) If we don t know σ{b 1 }, we use t-distribution. P ower = P r(accepth a H a istrue) = P r(accepth a β 1 = 2) = P r( ˆβ 1 0 > t 1 0.05/2,8 β 1 = 2) = P r( ˆβ 1 2 > 2.31 2/) + P r( ˆβ 1 2 < 2.31 2/) = 0.9352 + 0.0001151 = 0.9353 A.2(f) Also obtain the power of your test in part (b) if acutaully β 0 = 11. Assume σ{b 0 } =. (Case1) If we know the σ{b 0 }, we use normal distribution. = P r( ˆβ 0 11 P ower = P r(accepth a H a istrue) = P r( ˆβ 0 9 > z 1 α β 0 = 11) > 1.654 2 ) = 1 P r(z < 1.021667) = 1 0.1534 = 0.8465 (Case2) If we don t know σ{b 0 }, we use t-distribution. = P r( ˆβ 0 11 P ower = P r(accepth a H a istrue) = P r( ˆβ 0 9 > t 1 α β 0 = 11) > 1.86 2 ) = 1 P r(z < 0.8066) = 1 0.1534 = 0.7784 Problem# 3. Refer to Airfreight breakage (Problem 1). (a) Because of changes in airline routes, shipments may have to be transferred more frequently than in the past. Estimate the mean breakage for the following numbers of transfers: X = 2,4. Use separate 99% confidence intervals. Interpret your results. A.(a) 99% CI for E(Y x new ) is ˆβ 0 + ˆβ 1 x new ± t 0.995,10 2 ˆσ 2 ( 1 10 + (xnew x) (xi x) 2 ). 4

99% CI for E(Y x new = 2) is 18.20 ± 2.23 = (15.97, 20.43) and for E(Y x new = 4 is 26.20 ± 4.98 = (21.22, 31.18), respectively. We conclude with confidence coefficient 0.99 that the mean number of broken ampules required when 2 transfers are produced is somewhere between 15.97 and 20.43. We conclude with confidence coefficient 0.99 that the mean number of broken ampules required when 4 transfers are produced is somewhere between 21.22 and 31.18. (b) The next shipment will entail two transfers. Obtain a 99% prediction intervals for the number of broken ampules for this shipment. Interpret your prediction interval. A.(b) 99% PI for Y new when X new = 2 is ˆβ 0 + ˆβ 1 X new ± t 0.995,10 2 ˆσ 2 (1 + 1 + (Xnew x) 10 (xi ). x) 2 99% PI for Y new when X new = 2 is (12.75, 23.65). (c) In the next several days, three independent shipments will be made, each entailing two transfers. Obtain a 99% prediction interval for the mean number of ampules broken in the three shipments. Convert this interval into 99% prediction interval for the total number of ampules broken in the three shipments A.(c) 99% PI for mean of m new observations for given X new is ˆβ 0 + ˆβ 1 X new ±t 0.995,10 2 ˆσ 2 ( 1 3 + 1 10 + (Xnew x) (xi x) 2 ). The 99% PI for mean of 3 new observations for given X new = 2 is (14.57, 21.83). We obtain the prediction interval for total number of ampules broken in three shipments is (14.57 3, 21.83 3) = (43.70, 65.50) (d) Determine the boundard values of the 99% confidence band for the regression line when X new = 2 and when X new = 4. Is your confidence band wider at these two points than the corresponding confidence intervals in part (a)? Should it be? A.(d) The 1 α confidence band for the regression line is ŷ new ± W S{y new }, where w 2 = 2F (1 α, 2, 8) The 99% confidence band when X new = 2 is (15.44, 20.96) and when X new = 4 is (20.03, 32.37), respectively. It is wider than the corresponding CI in part (a) which make sense because it is boundary of CI. Problem# 4. Refer to Airfreight breakage (Problem 1). (a) Set up the ANOVA table. Which elements are additive? A.(a) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x 1 160.0 160.0 72.727 2.749e-05 *** Residuals 8 17.6 2.2 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Summ of Square and degree of freedom are additive, that is, SST o = SSR + SSE 5

and df total = df reg + df error. (b) Conduct an F test to decide whether or not there is a linear association between the number of times a carton is transferred and then the number of broken ampules; control the α risk at.05. State the alternatives, decision rule, and conclusion. A.(b) H 0 : y = β 0 + ɛ vs H a : y = β 0 + β 1 x + ɛ, i.e., H 0 : β 1 = 0 vs H a : β 1 0 Test statistic: F = MSR/MSE = 160/2.2 = 72.72 Decision rule: reject H 0 if F > F 0.95,1,8 = 5.32. We reject H 0 Conclusion: there is an evidence that a linear association between the number of times a carton is transferred and then the number of broken ampules is statistically significant with α = 0.05. (c) Obtain the t statistic for the test in part (b) and demonstrate numerically its equivalance to the F statistic obtained in part (b) A.(c) t = 8.528 and F = 72.727. Hence, F = t (d) Calculate R 2 and r. What proportion of the variation in Y is accounted for by introducing X into the regression model? A.(d) Since SSR=160, SSE=17.6, SSTo=160+17.6, R 2 = SSR/SST o = 1 SSE = 0.9009 SST o Ra 2 = 1 SSE/(n 2) = 0.8885 r = cor(x, y) = 0.949158 = sqrt(0.9009) SST 0/(n 1) 6