Simple Linear Regression

Similar documents
Ch 2: Simple Linear Regression

Multiple Linear Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Simple and Multiple Linear Regression

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression

Regression Models - Introduction

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

STAT Chapter 11: Regression

Simple Linear Regression

Ch 3: Multiple Linear Regression

Simple Linear Regression

Multiple Linear Regression

Statistical View of Least Squares

BNAD 276 Lecture 10 Simple Linear Regression Model

Linear models and their mathematical foundations: Simple linear regression

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Inference for Regression

Lecture 6 Multiple Linear Regression, cont.

Chapter 1. Linear Regression with One Predictor Variable

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Homework 2: Simple Linear Regression

Density Temp vs Ratio. temp

Business Statistics. Lecture 9: Simple Regression

ECON The Simple Regression Model

AMS 7 Correlation and Regression Lecture 8

Inferences for Regression

STAT 4385 Topic 03: Simple Linear Regression

Chapter 10. Simple Linear Regression and Correlation

Coefficient of Determination

Regression Analysis II

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Statistics for Engineers Lecture 9 Linear Regression

Math 423/533: The Main Theoretical Topics

Chapter 14 Simple Linear Regression (A)

Simple linear regression

Ch 13 & 14 - Regression Analysis

Section 3: Simple Linear Regression

Chapter 16. Simple Linear Regression and dcorrelation

STAT5044: Regression and Anova. Inyoung Kim

ST430 Exam 1 with Answers

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Inference in Regression Analysis

Lecture 14 Simple Linear Regression

Weighted Least Squares

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Correlation and the Analysis of Variance Approach to Simple Linear Regression

ST430 Exam 2 Solutions

REGRESSION ANALYSIS AND INDICATOR VARIABLES

Statistical View of Least Squares

Correlation Analysis

Lectures on Simple Linear Regression Stat 431, Summer 2012

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Regression Models - Introduction

Math 3330: Solution to midterm Exam

Business Statistics. Lecture 10: Correlation and Linear Regression

The Simple Linear Regression Model

Chapter 8: Simple Linear Regression

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Applied Regression Analysis

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Formal Statement of Simple Linear Regression Model

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Applied Econometrics (QEM)

Mathematics for Economics MA course

Basic Business Statistics 6 th Edition

ST505/S697R: Fall Homework 2 Solution.

STATS DOESN T SUCK! ~ CHAPTER 16

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Weighted Least Squares

Topic 10 - Linear Regression

1 Least Squares Estimation - multiple regression.

Chapter 3: Multiple Regression. August 14, 2018

MAT 212 Introduction to Business Statistics II Lecture Notes

Linear Regression. 1 Introduction. 2 Least Squares

The simple linear regression model discussed in Chapter 13 was written as

Correlation 1. December 4, HMS, 2017, v1.1

Lecture 11: Simple Linear Regression

Lecture 4 Multiple linear regression

ECO220Y Simple Regression: Testing the Slope

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

Statistics for Managers using Microsoft Excel 6 th Edition

where x and ȳ are the sample means of x 1,, x n

Chapter 16. Simple Linear Regression and Correlation

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Measuring the fit of the model - SSR

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Bias Variance Trade-off

28. SIMPLE LINEAR REGRESSION III

Estadística II Chapter 4: Simple linear regression

[y i α βx i ] 2 (2) Q = i=1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

4 Multiple Linear Regression

Inference for Regression Simple Linear Regression

Midterm 2 - Solutions

R 2 and F -Tests and ANOVA

Lecture 18: Simple Linear Regression

Y i = η + ɛ i, i = 1,...,n.

Transcription:

Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates) x 1, x 2,..., x k. 1 / 20 Simple Linear Regression Introduction

The Straight-Line Probabilistic Model Simplest case of a regression model: One independent variable, k = 1, x 1 x; Linear dependence; Model equation: E(Y ) = β 0 + β 1 x, or equivalently Y = β 0 + β 1 x + ɛ. 2 / 20 Simple Linear Regression The Straight-Line Probabilistic Model

Interpreting the parameters: β 0 is the intercept (so called because it is where the graph of y = β 0 + β 1 x meets the y-axis x = 0); β 1 is the slope; that is, the change in E(y) as x is changed to x + 1. Note: if β 1 = 0, x has no effect on y; that will often be an interesting hypothesis to test. 3 / 20 Simple Linear Regression The Straight-Line Probabilistic Model

Advertising and Sales example x = monthly advertising expenditure, in hundreds of dollars; y = monthly sales revenue, in thousands of dollars; β 0 = expected revenue with no advertising; β 1 = expected revenue increase per $100 increase in advertising, in thousands of dollars. Sample data for five months: Advertising 1 2 3 4 5 Revenue 1 1 2 2 4 4 / 20 Simple Linear Regression The Straight-Line Probabilistic Model

What do these data tell about β 0 and β 1? Advertising and revenue scatterplot y 0 1 2 3 4 0 1 2 3 4 5 x 5 / 20 Simple Linear Regression The Straight-Line Probabilistic Model

We could try various values of β 0 and β 1. For given values of β 0 and β 1, we get predictions p i = β 0 + β 1 x i, i = 1, 2, 3, 4, 5. The difference betweem the observed value y i and the prediction p i is the residual r i = y i p i, i = 1, 2, 3, 4, 5. A good choice of β 0 and β 1 gives accurate predictions, and generally small residuals. 6 / 20 Simple Linear Regression The Straight-Line Probabilistic Model

One candidate line (β 0 = 0.1, β 1 = 0.7): Advertising and revenue with candidate line y 0 1 2 3 4 0 1 2 3 4 5 x 7 / 20 Simple Linear Regression The Straight-Line Probabilistic Model

Fitting the Model How to measure the overall size of the residuals? Most common measure (but not the only possibility): sum of squares of residuals r 2 i = (y i p i ) 2 = {y i (β 0 + β 1 x i )} 2 = S(β 0, β 1 ). The least squares line is the one with the smallest sum of squares. Note: the least squares line has the property that r i = 0; Definition 3.1 (page 95) does not need to impose that as a constraint. 8 / 20 Simple Linear Regression Fitting the Model

The least squares estimates of β 0 and β 1 are the coefficients of the least squares line. Some algebra shows that the least squares estimates are ˆβ 1 = (xi x)(y i ȳ) (xi x) 2 = xi y i n xȳ x 2 i n x 2 and ˆβ 0 = ȳ ˆβ 1 x. With a little luck, you will never need to use these formulæ. 9 / 20 Simple Linear Regression Fitting the Model

Other criteria Why square the residuals? We could use least absolute deviations estimates, minimizing S 1 (β 0, β 1 ) = y i (β 0 + β 1 x i ). Convenience: we have equations for the least squares estimates, but to find the least absolute deviations estimates we have to solve a linear programming problem. Optimality: least squares estimates are BLUE if the errors ɛ are uncorrelated with constant variance, and MVUE if additionally ɛ is normal. 10 / 20 Simple Linear Regression Fitting the Model

Model Assumptions The least squares line gives point estimates of β 0 and β 1. These estimates are always unbiased. To use the other forms of statistical inference: interval estimates, such as confidence intervals; hypothesis tests; we need some assumptions about the random errors ɛ. 11 / 20 Simple Linear Regression Model Assumptions

1 Zero mean: E(ɛ i ) = 0; as noted earlier, this is not really an assumption, but a consequence of the definition ɛ = Y E(Y ). 2 Constant variance: V (ɛ i ) = σ 2 ; this is a nontrivial assumption, often violated in practice. 3 Normality: ɛ i N(0, σ 2 ); this is also a nontrivial assumption, always violated in practice, but sometimes a useful approximation. 4 Independence: ɛ i and ɛ j are statistically independent; another nontrivial assumption, often true in practice, but typically violated with time series and spatial data. 12 / 20 Simple Linear Regression Model Assumptions

Notes: Assumptions 2 and 4 are the conditions under which least squares estimates are BLUE (Best Linear Unbiased Estimators); Assumptions 2, 3, and 4 are the conditions under which least squares estimates are MVUE (Minimum Variance Unbiased Estimators). 13 / 20 Simple Linear Regression Model Assumptions

Estimating σ 2 ST 430/514 Recall that σ 2 is the variance of ɛ i, which we have assumed to be the same for all i. That is, σ 2 = V (ɛ i ) = V [Y i E(Y i )] = V [Y i (β 0 + β 1 x i )], i = 1, 2,..., n. We observe Y i = y i and x i ; if we knew β 0 and β 1, we would estimate σ 2 by 1 {yi (β 0 + β 1 x i )} 2 = 1 n n S(β 0, β 1 ). 14 / 20 Simple Linear Regression An Estimator of σ 2

We do not know β 0 and β 1, but we have least squares estimates ˆβ 0 and ˆβ 1. ( So we could use S ˆβ0, ˆβ ) 1 as an approximation to S(β 0, β 1 ). ( But we know that S ˆβ0, ˆβ ) 1 < S(β 0, β 1 ), so would be a biased estimate of σ 2. 1 ( n S ˆβ0, ˆβ ) 1 15 / 20 Simple Linear Regression An Estimator of σ 2

We can show that, under Assumptions 2 and 4, [ ( E S ˆβ0, ˆβ )] 1 = (n 2)σ 2. So s 2 = 1 ( ) n 2 S ˆβ 0, ˆβ 1 = 1 (yi ŷ i ) 2, n 2 where ŷ i = ˆβ 0 + ˆβ 1 x i, is an unbiased estimate of σ 2. This is sometimes written s 2 = Mean Square for Error = MS E Sum of Squares for Error = degrees of freedom for Error = SS E. df E 16 / 20 Simple Linear Regression An Estimator of σ 2

Inferences about the line ST 430/514 We are often interested in the question of whether x has any effect on E(Y ). Since E(Y ) = β 0 + β 1 x, the independent variable x has some effect whenever β 1 0. So we need to test the null hypothesis H 0 : β 1 = 0. 17 / 20 Simple Linear Regression Making Inferences About the Slope β 1

We also need to construct a confidence interval for β 1, to indicate how precisely we know its value. For both purposes, we need the standard error: σ ˆβ1 = σ SSxx, where SS xx = (x i x) 2. As always, since σ is unknown, we replace it by its estimate s, to get the estimated standard error ˆσ ˆβ 1 = s SSxx. 18 / 20 Simple Linear Regression Making Inferences About the Slope β 1

A confidence interval for β 1 is ˆβ 1 ± t α/2,n 2 ˆσ ˆβ 1. Note that we use the t-distribution with n 2 degrees of freedom, because that is the degrees of freedom associated with s 2. To test H 0 : β 1 = 0, we use the test statistic t = ˆβ 1 ˆσ ˆβ1, and reject H 0 at the significance level α if t > t α/2,n 2. 19 / 20 Simple Linear Regression Making Inferences About the Slope β 1

Compare the Confidence Interval and the Hypothesis Test Note that we reject H 0 if and only if the corresponding confidence interval does not include 0. 20 / 20 Simple Linear Regression Making Inferences About the Slope β 1