ST505/S697R: Fall Homework 2 Solution.

Similar documents
Lecture 3: Inference in SLR

Topic 14: Inference in Multiple Regression

Ch 2: Simple Linear Regression

Lecture 12 Inference in MLR

Lecture 6 Multiple Linear Regression, cont.

Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression

General Linear Model (Chapter 4)

df=degrees of freedom = n - 1

Overview Scatter Plot Example

Simple linear regression

Lecture 1 Linear Regression with One Predictor Variable.p2

Formal Statement of Simple Linear Regression Model

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

STOR 455 STATISTICAL METHODS I

Multiple Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Business Statistics. Lecture 10: Course Review

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Statistics 512: Applied Linear Models. Topic 1

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Inference in Normal Regression Model. Dr. Frank Wood

MS&E 226: Small Data

Homework 2: Simple Linear Regression

Chapter 6 Multiple Regression

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

16.400/453J Human Factors Engineering. Design of Experiments II

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Business Statistics. Lecture 10: Correlation and Linear Regression

STAT 350. Assignment 4

Simple Linear Regression

Math 3330: Solution to midterm Exam

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

STAT 3A03 Applied Regression With SAS Fall 2017

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Lecture 5: Comparing Treatment Means Montgomery: Section 3-5

MATH 644: Regression Analysis Methods

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

STAT505/ST697R: Regression Analysis, Fall 2012

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Nonparametric Regression and Bonferroni joint confidence intervals. Yang Feng

Course Information Text:

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Lecture 11: Simple Linear Regression

HOMEWORK ANALYSIS #3 - WATER AVAILABILITY (DATA FROM WEISBERG 2014)

Ch 3: Multiple Linear Regression

4. Examples. Results: Example 4.1 Implementation of the Example 3.1 in SAS. In SAS we can use the Proc Model procedure.

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Bias Variance Trade-off

Chapter 14 Simple Linear Regression (A)

A discussion on multiple regression models

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false.

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

. Example: For 3 factors, sse = (y ijkt. " y ijk

Hotelling s One- Sample T2

Lecture 11 Multiple Linear Regression

Correlation Analysis

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

STAT 350: Summer Semester Midterm 1: Solutions

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Simple Linear Regression: One Quantitative IV

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Density Temp vs Ratio. temp

Profile Analysis Multivariate Regression

Inferences for Regression

Inference for Regression

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

Statistics for Managers using Microsoft Excel 6 th Edition

Basic Business Statistics 6 th Edition

Differences of Least Squares Means

1 Tomato yield example.

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Laboratory Topics 4 & 5

Ordinary Least Squares Regression Explained: Vartanian

Sociology 593 Exam 1 February 17, 1995

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

4.8 Alternate Analysis as a Oneway ANOVA

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Statistics 5100 Spring 2018 Exam 1

STAT 525 Fall Final exam. Tuesday December 14, 2010

Models for Clustered Data

2. TRUE or FALSE: Converting the units of one measured variable alters the correlation of between it and a second variable.

Resampling Methods. Lukas Meier

Inference for the Regression Coefficient

The Difference in Proportions Test

PLS205 Lab 2 January 15, Laboratory Topic 3

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Transcription:

ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a) The estimated regression function is E(Y ) = 168.60000 + 2.03438 X, where Y is hardness and X is time. The plot of data and fitted line is below. There are two ways to interpret the question of whether the linear regression supplies a good fit. One is whether a straight line does a good job of modeling the expected value. The answer to that seems to be yes from graphical inspection. A second way, and different way, to view the question is how tight the fit is around the line. There is clearly variability in hardness at a fixed time. The question of how this variability relates to how good the fit is will depend on how the fit will be used. b) This is the fitted value at X = 40 is b 0 + b 1 40, which equals 249.95. You can get this directly or use the predicted value in the output for a case with X = 40. c) This is just β 1 which is estimated by 2.03438 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 168.60000 2.6502 63.45 < 2e-16 *** time 2.0343 0.09039 22.51 2.16e-12 *** --- Residual standard error: 3.234 on 14 degrees of freedom hardness 200 210 220 230 240 250 20 25 30 35 40 time Figure 1: Plot of data and fitted line. 1b; problem 1.26 a.) In SAS you can get the residuals using the p option in proc reg; in R using the residuals function. Easy to see that they add to 0. Whenever we fit a regression model with an overall intercept the residuals will add to 0. This is not the case if we fit with no intercept. hardness time fits resids 1 199 16 201.150-2.150 2 205 16 201.150 3.850 1

3 196 16 201.150-5.150 4 200 16 201.150-1.150 5 218 24 21.425 0.55 6 220 24 21.425 2.55 215 24 21.425-2.425 8 223 24 21.425 5.55 9 23 32 233.00 3.300 10 234 32 233.00 0.300 11 235 32 233.00 1.300 12 230 32 233.00-3.00 13 250 40 249.95 0.025 14 248 40 249.95-1.95 15 253 40 249.95 3.025 16 246 40 249.95-3.95 b) The estimate of σ 2 is ˆσ 2 = MSE = 10.45893 (10.5 in the R output). The estimate of σ is the square root of this so ˆσ = 3.23403 (also root MSE on SAS output, Residual standard error in R output)). 1c) The (estimated) standard error for b 0 is 2.65 and for b 1 is.09039. The CI for b 0 is computed using 168.6 ± t(.95, 14)2.65 and for b 1 using 2.03435 ± t(.95, 14).09039, where t(.95, 14) = 2.145. Using the confints in R or clb in SAS yields 95% confidence intervals of (Intercept),β 0 : [162.9013, 14.2985] time, β 1 : [1.8405, 2.22825] 1d, problems 2. a and b: a) Asking for a 99% confidence interval for β 1, which is (1.653, 2.30346). The interval can also be computed directly using b 1 ± t(.995, 14)s{b 1 } of obtained in either R (via confint with level =.99) or SAS (using clb with alpha =.01). Using R > confint(regout,level=.99) #99% confidence intervals 0.5 % 99.5 % (Intercept) 160.69045 16.509543 time 1.6528 2.303463 b) Testing H 0 : β 1 = 2 versus H A : β 1 2. The t-statistic is t = (2.03438 2)/0.09039 =.38035. With t(.95, 14) = 2.145, you do not reject H 0 since.38 < 2.145. The P-value is the sum of the area to the right of.38 and the left of -.38 under the t distribution with 14 degrees of freedom. This actually equals.094. Just using the t-tables, you can see that the area to the right of.38 is somewhere between.3 and.4, so from the tables you know the p-value is between.6 and.. Note that we also accept accept H 0 since the P-value is >.01. - You can also test this directly using the 99% CI for β 1 and reject H 0 if 2 is not in the interval. Since 2 is in the interval we do not reject H 0. This is equivalent to doing the t-test. 1d: problem 2.16 You can get a) and b) in SAS directly from the output using the clm and cli option if you include a new case in the data with Y missing (.) and X = 30. In R and for the other parts, the intervals can be computed in various ways using either SAS or R as a calculator (as demonstrated for the Kishi example). See code and output at end of solution that corresponds to this. a) [22.4569, 231.8056] = 229.6313 ± (2.264)0.8285, where s{ ˆµ(30)} = 0.8285 is the standard error associated with the estimated mean and t(.99, 14) = 2.624. 2

Note that s 2 {ˆµ(30)} = 0.8285 =.06+ 30 2 (.0082)+ 2 30 (.2288), where s 2 {b 0 } =.06, s 2 {b 1 } =.0082 and s{b 0, b 1 } =.2288 from the variance-covariance matrix of the coefficients > vcov(regout) #this gives the variance-covariance (Intercept) time (Intercept).05968-0.22889063 time -0.228891 0.00811038 b) (220.8695, 238.3930) = 229.6313 ± (2.624)(10.45893 + 0.8285 2 ) 1/2. c) 229.6313 ± (2.264)((10.45893/10)+ 0.8285 2 ) 1/2 = [226.2, 233.1]. d) The interval in c) is smaller since you are trying to predict the mean of 10 values which has less variability (σ 2 /10) than one value. e) 229.6313 ±(2 5.24) 1/2 0.8285 = [226.95, 232.32], where F(.99, 2, 14) = 5.24 (obtained exactly using SAS or R you could approximate using the entries for F(.95, 2, 12), F(.95, 2, 15), F(.99, 2, 12) and F(.99, 2, 15) from the F table.) 1f. hardness 200 210 220 230 240 250 20 25 30 35 40 time 1g. 1g: Problem 4.9 a) and b) X j = 20, 30 and 40. Figure 2: Individual CIs for means and Prediction intervals For Bonferroni, use b 0 +b 1 X j ±t(1.10/6, 14)s{ˆµ{X j }}, with t(1.10/6, 14) = 2.360 The Working-Hotelling/Scheffe intervals use the same form but with the t value replaced by (2F(.90, 2, 14)) 1/2 = 2.3352. The Working-Hotelling intervals will be smaller and more efficient since 2.335 < 2.360, but the difference is minor. Scheffe intervals Bonferroni intervals estimate SE X=20 206.55 211.821 206.28 211.84 209.288 1.0843 X=30 22.69 231.566 22.66 231.586 229.631 0.8284 X=40 246.816 253.134 246.83 253.168 249.95 1.35289 3

c). Now doing prediction intervals for two future values at X 1 = 30 and X 2 = 40. Use b 0 + b 1 X j ± t(1.10/4, 14)s{pred j }, with t(1.10/4, 14) = 2.145 and s 2 {pred j } = MSE + s 2 {ˆµ(X j )}. The Scheffe intervals use the same form but with the t value replaces by (2F(.90, 2, 14)) 1/2 = 2.3352. The Bonferroni intervals should be used here since shorter. Bonferroni Scheffe SEpredj X=30 222.41, 236.92 221.836, 23.42 3.33846 X=40 242.456, 25.494 241.89, 258.161 3.50560 2. Problem 2. a) The estimate of β 0 is.041, of β 1 is 2.10983, of σ 2 is.0630, and of σ is.2511 = (.0630) 1/2. b) 2.10983 ± 1.96(.01194) = [2.08643, 2.13323] c) H 0 : β 1 = 0. t = 2.10983/.01194 = 16.6. The p-value for this is the area to the right of 16.6 plus the area to the left of - 16.6 under the standard normal (use this since the degrees of freedom is 3164). Since the area to the right of 3.291 is equal to.0005 (see table) we know the P-value is less than 2*.0005 =.0001. As a probability, the P-value is the probability of getting a value of the t greater than or equal to the value of the observed absolute value, under the null hypothesis. Here, it is the probability that t > 16.6 where t is distributed t with 3164 degrees of freedom, which is for practical purposes the standard normal distribution. d) Ŷh = 3.591966, s{ŷh} = 0.0066024 = (s 2 {b 0 } + X 2 h s2 {b 1 } + 2X h s{b 0, b 1 }) 1/2, s{pred} = 0.2512242 = (MSE + s 1 2{Ŷh}) 1/2. Use z = 1.96 in getting the intervals. Confidence interval for E(Y) at X = 1.66 is [3.588559, 3.60433] Prediction interval for Y at X = 1.66, is [3.099392, 4.084196]. 3. Problem 3. Below are results for designs 1 and 2. Here I ve given parts of the SAS output. Similar results apply for R. The first part of each program gives you the true standard errors of the estimated coefficients. The expected values of b 0, b 1 and MSE are known to be exactly β 0 and β 1 and σ 2 (0, 1 and.0225) respectively. To choose between designs we can do that in terms of the true standard errors of the estimated coefficients go. The second design is better because the standard errors are smaller. We know the estimators are unbiased for both unbiased so we can just compare variance or standard errors to make that decision. - The second part of the output gives summary statistics over the thousands of simulated values. The fact that the means of the three variables are not exactly the true parameters is because we are running a limited number of simulations. The histograms represent the sampling distribution of these estimators (this is not the exact sampling distribution because of a limited number of simulations but it will be close). - If you thought of which design is better for estimating σ 2 you d have to rely on the simulation results since I didn t tell you the variance of MSE. From the simulation results it looks like design 2 is a little better. In fact the two designs are equivalent from this perspective. It can be shown that the standard deviation of MSE is (2σ 4 /(n 2)) 1/2 ( =.01006 here), which only depends on the design through n. All designs with the same n are equally good for estimating σ 2. This may surprise you. Homework 2, number 3 Design 1 coefficients: beta0 = 0 beta1 = 1 sigma2 0.0225 sigma = 0.15 number of simulations 1000 n = 12 xvalues =.6 6.31 6.14.0 4

6.39 5.95 6.53 6.55 4 5.4 4.94.0 using normal errors THEORETICAL VARIANCES AND STANDARD DEVIATION variance of b0 = 0.141392 sd of b0 = 0.360042 variance of b1 = 0.0035056 sd of b1 = 0.059208 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum B0 1000 0.0228333 0.334848-1.25449 1.3339013 B1 1000 0.996695 0.0588648 0.961849 1.194983 MSE 1000 0.022640 0.010281 0.0033132 0.005808 And here are results from design 2. Homework 2, number 3 Design 2 1 coefficients: beta0 = 0 beta1 = 1 sigma2 0.0225 sigma = 0.15 number of simulations 1000 n = 12 xvalues = 6 6 using normal errors THEORETICAL VARIANCES AND STANDARD DEVIATION variance of b0 = 0.1181024 sd of b0 = 0.3436603 variance of b1 = 0.0030981 sd of b1 = 0.0556606 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum B0 1000 0.0081894 0.340412-1.0216963 1.1265641 B1 1000 0.998300 0.0549423 0.805541 1.11619 MSE 1000 0.0225139 0.0098290 0.003846 0.0690602 5