Confidence Interval for the mean response

Similar documents
Inference for the Regression Coefficient

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression Examples

Model Building Chap 5 p251

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Models with qualitative explanatory variables p216

Lecture 11 Multiple Linear Regression

Basic Business Statistics, 10/e

Chapter 6 Multiple Regression

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

ANOVA: Analysis of Variation

28. SIMPLE LINEAR REGRESSION III

Inference for Regression Inference about the Regression Model and Using the Regression Line

Lecture 13 Extra Sums of Squares

INFERENCE FOR REGRESSION

Analysis of Bivariate Data

Basic Business Statistics 6 th Edition

Ch 13 & 14 - Regression Analysis

Orthogonal contrasts for a 2x2 factorial design Example p130

23. Inference for regression

Multiple Linear Regression

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

Lecture 18: Simple Linear Regression

Inference for Regression Simple Linear Regression

The simple linear regression model discussed in Chapter 13 was written as

Lecture 10 Multiple Linear Regression

Multiple Regression Methods

SMAM 314 Practice Final Examination Winter 2003

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

STAT 360-Linear Models

Inferences for linear regression (sections 12.1, 12.2)

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

Introduction to Regression

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

sociology 362 regression

SMAM 314 Exam 42 Name

Correlation & Simple Regression

School of Mathematical Sciences. Question 1

The Multiple Regression Model

Concordia University (5+5)Q 1.

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Ch Inference for Linear Regression

ECO220Y Simple Regression: Testing the Slope

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

sociology 362 regression

1 Introduction to Minitab

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights

TMA4255 Applied Statistics V2016 (5)

This document contains 3 sets of practice problems.

Chapter 15 Multiple Regression

General Linear Model (Chapter 4)

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

EX1. One way ANOVA: miles versus Plug. a) What are the hypotheses to be tested? b) What are df 1 and df 2? Verify by hand. , y 3

Lecture 1 Linear Regression with One Predictor Variable.p2

Statistics for Managers using Microsoft Excel 6 th Edition

Simple Linear Regression: A Model for the Mean. Chap 7

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Lecture 11: Simple Linear Regression

1 Use of indicator random variables. (Chapter 8)

(1) The explanatory or predictor variables may be qualitative. (We ll focus on examples where this is the case.)

Chapter 14 Multiple Regression Analysis

Q Lecture Introduction to Regression

Inferences for Regression

School of Mathematical Sciences. Question 1. Best Subsets Regression

Mathematics for Economics MA course

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

Formal Statement of Simple Linear Regression Model

Multiple Regression an Introduction. Stat 511 Chap 9

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Simple Linear Regression Using Ordinary Least Squares

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Chapter 4. Regression Models. Learning Objectives

Lecture 9: Linear Regression

Topic 14: Inference in Multiple Regression

Is economic freedom related to economic growth?

Multiple Regression: Chapter 13. July 24, 2015

Data Set 8: Laysan Finch Beak Widths

Chapter 14 Student Lecture Notes 14-1

Stat 501, F. Chiaromonte. Lecture #8

Correlation and Regression

SIMPLE REGRESSION ANALYSIS. Business Statistics

Introduction to Regression

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Chapter 12: Multiple Regression

Describing distributions with numbers

Ch 2: Simple Linear Regression

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

One-Way Analysis of Variance (ANOVA)

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

10 Model Checking and Regression Diagnostics

Transcription:

Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables. (ch10) [pp34-60] - ANOVA table and the F-test Confidence Interval for the mean response A level C CI for the mean response µ y when x takes value x* is where SE ˆ µ ˆ µ y * ± t SE ˆ µ * 1 ( x x) = s + n ( x x) and t * is the value for the t( n ) density curve with area C between -t* and t *. i 1

Regression Analysis: Wages versus LOS The regression equation is Wages = 44. + 0.0731 LOS Predictor Coef SE Coef T P Constant 44.13.68 16.8 0.000 LOS 0.07310 0.03015.4 0.018 S = 11.9791 R-Sq = 9.% R-Sq(adj) = 7.6% Analysis of Variance Source DF SS MS F P Regression 1 843.5 843.5 5.88 0.018 Residual Error 58 83.9 143.5 Total 59 9166.4 Unusual Observations Obs LOS Wages Fit SE Fit Residual St Resid 15 70 97.68 49.33 1.55 48.35 4.07R 54.95 60.44 4.8-5.50-0.50 X 30 150 80.59 55.18.85 5.41.18R 4 8 67.91 60.88 4.99 7.03 0.65 X 47 04 50.17 59.13 4.31-8.95-0.80 X R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large influence. Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 46.84 1.86 (43.11, 50.57) (.58, 71.11) Values of Predictors for New Observations New Obs LOS 1 36.0

Minitab commands for prediction 3

4

Prediction Interval for a new observation A level C PI for a new observation with * x = x is where where yˆ * ± t SE y ˆ * 1 ( x x) = s 1+ + n ( x x) SE y ˆ and i t * is the value for the t( n ) density curve with area C between -t* and t *. Note SE ˆ ˆ µ = SE + MSE y 5

Example: Give a 95% PI for the wage of an employee with 3 years experience. (i.e LOS=36) Example: Give a 90% PI for the wage of an employee with 3 years experience. (i.e LOS=36) 6

Lack of Fit Test when there are replicated x- settings Ex Let x = amount calcium in diet, y = change in blood pressure over specified time period for each of 11 experimental subjects. Row x y 1 40 10 40 1 3 50 8 4 60 9 5 60 7 6 70 6 7 80 6 8 80 4 9 90 4 10 100 3 11 100 4 y 1 11 10 9 8 7 6 5 4 3 40 50 60 70 x 80 90 100 7

Regression Analysis The regression equation is y = 15. - 0.13 x Predictor Coef StDev T P Constant 15.41 1.113 13.70 0.000 x -0.19 0.0153-8.07 0.000 S = 1.055 R-Sq = 87.9% R-Sq(adj) = 86.5% Analysis of Variance Source DF SS MS F P Regression 1 7.51 7.51 65.11 0.000 Residual Error 9 10.05 1.114 Total 10 8.545 Do you think you should try amending your model? (quadratic model? Transformation?) Testing for the lack of fit of a linear model Step 1 Calculate Pure Error SS =6.5 and d.f pure error = 4 Step Calculate Lack of fit SS = SSE P. E SS =10.05 6.5 =3.55 and d.f. LOF = d.f Error d.f P.E. = 9 4 = 5 8

Step 3 Calculate MSLOF = SSLOF/d.f LOF =3.55/5 = 0.705 and MSPE = SSPE./d.f P.E. = 6.5/4 = 1.65 Step 4 Calculate the test statistic F = MS LOF/MS P.E = 0.705/1.65 = 0.43 Step 5 Compare this value with the critical value from F(5,4). Regression Analysis: y versus x The regression equation is y = 15. - 0.13 x Predictor Coef SE Coef T P Constant 15.41 1.113 13.70 0.000 x -0.19 0.0153-8.07 0.000 S = 1.05539 R-Sq = 87.9% R-Sq(adj) = 86.5% Analysis of Variance Source DF SS MS F P Regression 1 7.51 7.51 65.11 0.000 Residual Error 9 10.05 1.114 Lack of Fit 5 3.55 0.705 0.43 0.808 Pure Error 4 6.500 1.65 Total 10 8.545 3 rows with no replicates 9

Multiple regression The statistical model for multiple linear regression is y = β + β 0 1 x + β i i,1 x + β i, p x + ε i, p i for i = 1,,,n. The errors ε are independent and normally i distributed with mean 0 and std dev σ. -Interpretation of the regression coefficients -tests and CI s for β s -ANOVA table -ANOVA F-test -R-square = SSR = 1 SSE and SST SST R-sq(adj)= MSE ( n 1) 1 = 1 ( 1 R) MST n ( k+ 1) R-sq(adj) R-sq -estimate of σ = MSE 10

Example (CS data in data appendix in IPS CD) Regression Analysis The regression equation is gpa = 0.37 +0.000944 satm -0.000408 satv + 0.146 hsm + 0.0359 hss + 0.0553 hse Predictor Coef StDev T P Constant 0.367 0.4000 0.8 0.415 satm 0.0009436 0.0006857 1.38 0.170 satv -0.0004078 0.0005919-0.69 0.49 hsm 0.14596 0.0396 3.7 0.000 hss 0.03591 0.03780 0.95 0.343 hse 0.0559 0.03957 1.40 0.164 S = 0.7000 R-Sq = 1.1% R-Sq(adj) = 19.3% Analysis of Variance Source DF SS MS F P Regression 5 8.6436 5.787 11.69 0.000 Residual Error 18 106.8191 0.4900 Total 3 135.468 11

Minitab commands for multiple regression 1

13

-Prediction Residual Analysis - We will use residuals for examining the following six types of departures from the model. - The regression is nonlinear - The error terms do not have constant variance - The error terms are not independent - The model fits but some outliers - The error terms are not normally distributed - One or more important variables have been omitted from the model Residual plots Residuals vs X or fitted values Residuals vs time (when the data are obtained in a time sequence) or other variables Residuals vs normal scores Stemplots, boxplots of residuals Plots of absolute values of the residuals (or squared residuals) against X or against Fitted values are also helpful in diagnosing nonconstancy of error variance. 14

Example. Residual analysis for the above example Residuals Versus the Fitted Values (response is gpa) 3 Standardized Residual 1 0-1 - -3 1 3 Fitted Value Normal Probability Plot of the Residuals (response is gpa) 3 Normal Score 1 0-1 - -3-3 - -1 0 1 3 Standardized Residual Histogram of the Residuals (response is gpa) 35 30 5 Frequency 0 15 10 5 0-3 - -1 0 1 3 Standardized Residual 15