AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Similar documents
Ch 2: Simple Linear Regression

Inference for Regression Simple Linear Regression

Measuring the fit of the model - SSR

Simple and Multiple Linear Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

Lecture 14 Simple Linear Regression

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Inference for Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Simple Linear Regression

BNAD 276 Lecture 10 Simple Linear Regression Model

Math 3330: Solution to midterm Exam

Ch 3: Multiple Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Simple Linear Regression

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Business Statistics. Lecture 10: Correlation and Linear Regression

Statistics 112 Simple Linear Regression Fuel Consumption Example March 1, 2004 E. Bura

13 Simple Linear Regression

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Six Sigma Black Belt Study Guides

STAT Chapter 11: Regression

Intro to Linear Regression

Lecture 3: Inference in SLR

Correlation Analysis

Statistical Techniques II EXST7015 Simple Linear Regression

Simple Linear Regression

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Inference for the Regression Coefficient

Linear Models and Estimation by Least Squares

Lecture 15. Hypothesis testing in the linear model

Linear models and their mathematical foundations: Simple linear regression

Analysis of Bivariate Data

ST430 Exam 1 with Answers

Chapter 5. Elements of Multiple Regression Analysis: Two Independent Variables

STAT 4385 Topic 03: Simple Linear Regression

Multiple Linear Regression

How to mathematically model a linear relationship and make predictions.

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Basic Business Statistics 6 th Edition

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Lecture 9: Linear Regression

Density Temp vs Ratio. temp

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Chapter 14 Simple Linear Regression (A)

Mathematics for Economics MA course

EXST Regression Techniques Page 1 SIMPLE LINEAR REGRESSION WITH MATRIX ALGEBRA

Lecture 6 Multiple Linear Regression, cont.

Scatter plot of data from the study. Linear Regression

Simple Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

Applied Regression Analysis

Question Possible Points Score Total 100

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

STAT 111 Recitation 7

1. Define the following terms (1 point each): alternative hypothesis

Intro to Linear Regression

Homework 2: Simple Linear Regression

Confidence Intervals, Testing and ANOVA Summary

Scatter plot of data from the study. Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Data Analysis and Statistical Methods Statistics 651

y n 1 ( x i x )( y y i n 1 i y 2

Coefficient of Determination

Lecture 18: Simple Linear Regression

Simple Linear Regression

Ordinary Least Squares Regression Explained: Vartanian

Section Least Squares Regression

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Inferences for Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression

Lecture 10 Multiple Linear Regression

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

CHAPTER EIGHT Linear Regression

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Chapter 1. Linear Regression with One Predictor Variable

Regression Models - Introduction

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Correlation and Regression

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Simple Linear Regression Analysis

The simple linear regression model discussed in Chapter 13 was written as

Lectures on Simple Linear Regression Stat 431, Summer 2012

AMS 7 Correlation and Regression Lecture 8

Simple linear regression

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Linear Regression Model. Badr Missaoui

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Correlation and Linear Regression

Lecture 8 CORRELATION AND LINEAR REGRESSION

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Transcription:

AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number of dinners y to be prepared. Data on reservations and numbers of dinners served for one day chosen at random from each week in a 100-week period gave the following results: (# of meals) 100 66 33 50 100 150 200 (# of reservations) Question: Suppose the # of reservations for a future week is 135, how many meals should be prepared? 1

11.2 A simple graphical representation: the scatter plot 11.3 Transformation to linearize data 11.4 The simple linear (regression) model: y = β 0 + β 1 x + ɛ, where ɛ is a random error with mean 0 and variance σ 2 (unknown but usually assumed to be constant.) 2

11.5 The least squares method of model fitting Suppose the fitted line is ŷ = ˆβ 0 + ˆβ 1 x; the sum of the squared distance between the fitted value ŷ and the observed value y is = (y i ŷ i ) 2 = (y i ˆβ 0 ˆβ 1 x i ) 2 ; the least squares estimators of the model parameters β 0 and β 1 are the values of ˆβ 0 and ˆβ 1 that minimize δ, they are; ˆβ 0 = ȳ ˆβ i x and ˆβ 1 = S XY S XX where S XY = (X i X)(Y i Ȳ ) = X i Y i ( X i )( Y i ) = X i Y i n( X)(Ȳ ). n S XX = (X i X) 2 = Xi 2 ( X i ) 2 = Xi 2 n( n X) 2 ; [same for S Y Y = (Y i Ȳ )2 ] A good estimator for the error variance σ 2 is the mean square error s 2 ɛ = (y i ŷ i ) 2 /(n 2) = SSE n 2, s ɛ = SSE n 2 11.6 Partitioning the variability (S ɛ is called the residual standard deviation) y i ȳ = y i ŷ i + ŷ i ȳ (y i ȳ) 2 = (y i ŷ i ) 2 + (ŷ i ȳ) 2 SSTotal = SSError + SSREG (Note: SS stands for Sum of Squares ). 3

A useful measure of model fit is the Coefficient of determination (R 2 ). R 2 = SSREG SSTotal, 0 R2 1 The larger the R 2, the closer the fit. The Sample correlation coefficient between X and Y is r X,Y = S XY SXX S Y Y, 1 r X,Y 1. It measures the linear relationship between X and Y. r X,Y = +1 Y = a + bx, b > 0, r X,Y = 1 Y = a bx, b > 0. For the simple linear regression ŷ = ˆβ 0 + ˆβ 1 x; we have (prove!) rx,y 2 = r 2 and r 2 Y,Ŷ X,Y = R 2 11.7 Distributions of the estimated model parameters In order to construct the CI s for the unknown parameters β 0 and β 1, or to do hypothesis test such as H 0 : β 0 = v.s. H 1 : β 1 0. We need to know the distributions of ˆβ 0 and ˆβ 1. To do this, we assume the distribution of the random error ɛ to be normal, i.e. ɛ N(0, σ 2 ). Under this normality assumption, T 1 = ˆβ 1 β 1 s ɛ / S XX t n 2 ; T 0 = ˆβ 0 β 0 X 2 i s ɛ n S XX t n 2. Under H 0 : β 1 = 0, T 1 = ˆβ 1 0 S ɛ / S XX. 4

(T 1 ) 2 = ( ˆβ 1 ) 2 S XX s 2 ɛ = SSREG s 2 ɛ 11.8 Checking the model assumptions F 1,n 2. The constant variance assumption can be checked via a scatter plot of the residuals (y i ŷ i ) versus x i (or ŷ i ). This plot is often called the residual plot. The normality assumption : a normal p-p plot of the standardized residuals (residual divided by its standard error.) EXAMPLE 11.1 A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number of dinners y to be prepared. Data on reservations and number of dinners served for one day chosen at random from each week in a 100-week period gave the following results. x = 150 ȳ = 120 5

(x x) 2 = 90, 000 (y ȳ) 2 = 70, 000. (x x)(y ȳ) = 60, 000 a. Find the least squares estimates ˆβ 0 and ˆβ 1 for the linear regression line ŷ = ˆβ 0 + ˆβ 1 x. b. Predict the number of meals to be prepared if the number of reservations is 135. c. Construct a 90% confidence interval for the slope. Does information on x (number of advance reservations) help in predicting y (number of dinners prepared)? Solution: a. The least squares estimates are given by ˆβ 1 = S XY 60, 000 = S XX 90, 000 =.67 and ˆβ 0 = ȳ ˆβ 1 x = 120.67(150) = 19.50. b. The predicted number of meals required for the number of advance reservations equal to 135 is ŷ = 19.50 +.67(135) = 109.95, or 110. c. The 90% confidence interval for β 1 uses the formula ˆβ 1 ± t(standard error), where the standard error is s t / S XX. Although Table 4 in the Appendix does not list a t-value for α =.05 and df = 98, we ll use the t-value for the next higher df(df = 120); this value is 1.658. The standard deviation s ɛ can be computed using the summary sample data where s 2 ɛ = SSE n 2, SSE = S Y Y ˆβ 1 S XY = 70, 000 0.67(60, 000) = 29, 800. 6

Thus, 29, 800 s ɛ = = 304.08 = 17.44 98 and the 90% confidence interval for β 1 is 0.67 ± 1.658 (17.44) 90, 000 or 0.67 ±.10. Since we are 90% confident that the true value of β 1 lies somewhere in the interval.57 β 1.77, we are thus confident the increase in y ( number of dinners prepared) for every increase of one advance reservation is in the interval from.57 to.77. Also, since the interval for β 1 does not include 0 as a possible value for the slope, it appears that the number of advance reservations is a useful predictor of the number of meals to be prepared in the context of a linear regression model, y = β 0 + β 1 x + ɛ. EXAMPLE 11.2 Refer to the data of Example 11.1. Confirm the conclusion we receached concerning β 1 by conducting a test of H 0 : β 1 = 0 versus H a : β 1 0. Use α =.10. Solution: The parts of the statistical test are given here: H 0 : β 1 = 0 H a : β 1 0 T.S. : t = ˆβ 1 SXX s ɛ = 0.67 17.44 = 11.53 90,000 R.R. : For a two-tailed test with α =.10 and df = 98, we will reject H 0 if t > 1.645. Conclusion: Since t = 11.53 is greater than 1.645, we have sufficient evidence to reject H 0. It does appear that x is useful in predicting y. 7

8

The Simple Linear Regression independent variable The dependent variable Random error unknown model parameters β 0 : intercept; β 1 : slope