Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Similar documents
Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Ch 2: Simple Linear Regression

Measuring the fit of the model - SSR

CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

Inferences for Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Correlation Analysis

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Chapter 14 Simple Linear Regression (A)

Basic Business Statistics 6 th Edition

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

STAT Chapter 11: Regression

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 4. Regression Models. Learning Objectives

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Unit 6 - Introduction to linear regression

Statistics for Managers using Microsoft Excel 6 th Edition

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Simple Linear Regression

Lecture 18: Simple Linear Regression

Simple and Multiple Linear Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

AMS 7 Correlation and Regression Lecture 8

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

BNAD 276 Lecture 10 Simple Linear Regression Model

Simple Linear Regression

Ch 3: Multiple Linear Regression

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

ECO220Y Simple Regression: Testing the Slope

Inference for Regression

Density Temp vs Ratio. temp

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Inference for Regression Inference about the Regression Model and Using the Regression Line

Chapter 4: Regression Models

Unit 6 - Simple linear regression

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

CHAPTER EIGHT Linear Regression

Chapter 13. Multiple Regression and Model Building

Lecture 11: Simple Linear Regression

Simple Linear Regression Using Ordinary Least Squares

Ch 13 & 14 - Regression Analysis

Analysis of Bivariate Data

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

2. Outliers and inference for regression

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Overview Scatter Plot Example

Hypothesis Testing hypothesis testing approach

Simple Linear Regression

Topic 10 - Linear Regression

Chapter 16. Simple Linear Regression and Correlation

Module 7 Practice problem and Homework answers

Inference for Regression Simple Linear Regression

Unit 10: Simple Linear Regression and Correlation

Inference for the Regression Coefficient

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Midterm 2 - Solutions

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

A discussion on multiple regression models

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Simple Linear Regression

Chapter 9. Correlation and Regression

Regression Models - Introduction

Lecture 3: Inference in SLR

Statistical Techniques II EXST7015 Simple Linear Regression

Ch. 1: Data and Distributions

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

Lecture 10 Multiple Linear Regression

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Chapter 14. Linear least squares

Simple linear regression

Simple linear regression

Applied Regression Analysis

Mathematics for Economics MA course

Coefficient of Determination

bivariate correlation bivariate regression multiple regression

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Basic Business Statistics, 10/e

ECON 450 Development Economics

Chapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Psychology 282 Lecture #4 Outline Inferences in SLR

Regression Models. Chapter 4

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Lectures on Simple Linear Regression Stat 431, Summer 2012

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930)

Transcription:

Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation is predicted by an independent variable. The relationship is specified as: y = b 0 + b 1 x Where: b 0 and b 1 are fixed numbers. Further, the value of b 0 determines the point at which the straight line crosses the y- axis; the y-intercept.

Regression Analysis: Simple Page: 2 The value of b 1 determines the slope of the line. Generally, specified as the amount by which y changes for every unit change in x. If b 1 < 0 the slope is negative If b 1 > 0 the slope is positive If b 1 = 0 the slope does not exist -- the line is parallel to the x-axis.

Regression Analysis: Simple Page: 3 Notation Used S xx = Σx 2 - (Σx) 2 / n OR Σ(x - x ) 2 S xy = Σxy - (ΣxΣy) / n OR Σ(x - x )(y - y ) S yy = Σy 2 - (Σy) 2 / n OR Σ(y - y ) 2 b 1 = S xy / S xx b 0 = (1/n)(Σy - b 1 Σx) The Least-Square Criterion The straight line that best fits a set of data points is the one for which the sum of squared errors is smallest. Regression Line The straight line that fits a set of data points the best according to the least-square criterion is the regression line. The Regression Equation The equation of the regression line is stated as: y = b 0 + b 1 x Estimation & Prediction Estimation or prediction is simply using the regression equation on the sample x-values to provide an estimate of the corresponding y-value. That is, ŷ = b 0 + b 1 x

Regression Analysis: Simple Page: 4 The Total sum of Squares The total squared differences in y is called the total sum of squares. The formula used is: SST = S yy = Σ(y - y ) 2 Error Sum Of Squares The total squared error is called the sum of squares error. The formula used is: SSE = S yy - (S xy ) 2 /S xx = Σ(y - ŷ ) 2 Sum Of Squares Due To Regression The total amount of variation explained by the regression line is called the regression sum of squares. The formula used is: SSR = (S xy ) 2 /S xx = Σ( ŷ - y ) 2 The Regression Identity SST = SSR + SSE

Regression Analysis: Simple Page: 5 The Coefficient Of Determination The r-square or the coefficient of determination is the percentage reduction obtained in the total squared error by using the regression equation instead of the sample mean to predict the observed y-values. In other words, it is the amount of variation in the dependent variable that is explained by the independent variable. OR OR r 2 = (SSR / SST) r 2 = 1 - (SSE / SST) r 2 = ((S xy ) 2 /S xx S yy )

Regression Analysis: Simple Page: 6 Linear Correlation The square-root of r 2 determines the linear correlation (r) between x & y. r ranges from 0 to ±1. Positive values indicate a positive correlation; negative values indicate a negative correlation. If x & y are correlated it does not mean they are related. Other Terms Extrapolation Using the regression equation to make predictions for x-values outside the range of x-values in the sample data. Outliers & Influential Observations Recall: An outlier is an observation that lies outside the overall pattern of the data. In regression context an outlier is a data point that lies far from the regression line. An influential observation is a data point whose removal causes the regression equation to change considerably. Scatter Plots Plot of x & y to visualize the pattern of the sample data. If the plot shows a non-linear relationship between x (predictor variables) and y (response variable) DO NOT use linear regression methods.

Regression Analysis: Simple Page: 7 Four Data Sets Having Same Value of Summary Statistics (Source: Anscombe, 1973) Data Set 1 Data Set 2 Data Set 3 Data Set 4 x1 y1 x2 y2 x3 y3 x4 y4 4 4.26 4 3.1 4 5.39 8 6.58 5 5.68 5 4.74 5 5.73 8 5.76 6 7.24 6 6.13 6 6.04 8 7.71 7 4.82 7 7.26 7 6.42 8 8.84 8 6.95 8 8.14 8 6.77 8 8.47 9 8.81 9 8.77 9 7.11 8 7.04 10 8.04 10 9.14 10 7.46 8 5.25 11 8.33 11 9.26 11 7.81 8 5.56 12 10.84 12 9.13 12 8.15 8 7.91 13 7.58 13 8.74 13 12.74 8 6.89 14 9.96 14 8.1 14 8.84 19 12.5 Mean 9.00 7.50 9.00 7.50 9.00 7.50 9.00 7.50 STD. 3.32 2.03 3.32 2.03 3.32 2.03 3.32 2.03

Regression Analysis: Simple Page: 8 Data Set 1 Data Set 2 12 10 8 6 4 2 0 0 5 10 15 12 10 8 6 4 2 0 0 5 10 15 Data Set 3 Data Set 4 14 12 10 8 6 4 2 0 0 5 10 15 0 2 4 6 8 10 12 14 0 5 10 15 20 Inferential Methods Assumptions For Regression Inferences 1. Population Regression Line (Assumption I): There is a straight line, y = β 0 + β 1 x such that for each x-value, the mean of the corresponding population of y-values lies on that straight line.

Regression Analysis: Simple Page: 9

Regression Analysis: Simple Page: 10 2. Equal Standard Deviations (Assumption II): The standard deviation, σ, of the population of y-values corresponding to a particular x-value is the same, regardless of the x-value. 3. Normality (Assumption III): For each x-value, the corresponding population of y-values is normally distributed.

Regression Analysis: Simple Page: 11 Standard Error Of The Estimate The standard error of the estimate is defined by: s e = Sqrt(SSE / (n - 2)) It provides us with an estimate for the population standard deviation. The s e indicates how far the observed y-values are from the predicted y-values, on average. Residual Analysis For the Regression Model If the assumptions for regression inferences are met, then the following two conditions should hold. 1. A plot of the residuals against the x-values should fall roughly in a horizontal band centered and symmetric about the x-axis. If a pattern is indicated you probably need to use some other analytical method that simple linear regression. For example, the following graph shows that a quadratic relationship exists in the residuals.

Regression Analysis: Simple Page: 12 Residual 30 20 10 0-10 -20-30 Linearity Test Dependent Variable: Min 1 2 3 4 5 6 7 8 9 10 11 Units 12 13 14 15 16 17 18 19 20 2. A normal probability plot of the residuals should be roughly linear. For example, the following graph shows that the normality assumption is violated. Sorted Residual 30 20 10 0-10 -20-30 Normal Probability Plot Dependent Variable: Min -30-20 -10 0 10 Expected Residual 20 30

Regression Analysis: Simple Page: 13 Hypothesis Tests For The Slope H o : β 1 = 0 H a : β 1 0 The test statistic has a t-distribution with df=(n - 2). t = (b 1 - β 1 ) / (s e / Sqrt(S xx )) If the value of the test statistic falls in the rejection region, then reject the null; otherwise do not reject the null. Confidence Intervals for the Slope The endpoints of the confidence interval for β 1 are: b 1 ± t α/2 ( S e S xx ) with df = n-2 Confidence Intervals for Means in Regression yˆ p ( Σ ) 2 p x x 1 ± n t /2s + with df = (n -2) α e n S xx Confidence Intervals for a Population y-value given an x-value ( Σ ) 2 p x x 1 yˆ p ± 1 n t /2s + + with df = (n -2) α e n S xx

Regression Analysis: Simple Page: 14 Aside: The F distribution Not symmetric Does not have zero at its center Using the F-Table Need a significance level, numerator degrees of freedom (n 1-1), and denominator degrees of freedom (n 2-1) To find the left tailed critical value we can use the reciprocal of the right-tailed value with the numbers of degrees of freedom reversed. For example: if n 1 = 10, n 2 = 7, alpha=0.05 the right tail value is 5.5234 (from table). The left tail value is (1/4.317 = 0.2315). That is, 4.317 is the critical value for 6,9 degrees of freedom.

Regression Analysis: Simple Page: 15 Regression Example << Regrsam.xls >> Age (x) Price (y) xy Sqr(x) Sqr(y) 6 125 750 36 15625 6 115 690 36 13225 6 130 780 36 16900 2 260 520 4 67600 2 219 438 4 47961 5 150 750 25 22500 4 190 760 16 36100 5 163 815 25 26569 1 260 260 1 67600 4 160 640 16 25600 TOTALS 41 1772 6403 199 339680 Syy 339680 - Sqr(1772) / 10 = 25681.60 Sxx 199 - Sqr(41) / 10 = 30.90 Sxy 6403 - (41)(1772) / 10 = -862.20 B1 = Sxy / Sxx = -27.90 Bo = (1/11 (1772 - b1(41))) = 265.09 SST = Syy = 25681.600 SSR = (Sxy*Sxy) / Sxx = 24057.891 SSE = SST SSR = 1623.709 Se = Sqrt(SSE / (n-2)) = 14.247 r-square = (1-(SSE / SST))*100 = 93.68% r = Sqrt(r-square) = 0.97

Regression Analysis: Simple Page: 16 Hypothesis Testing Ho: B1 = 0 Ha: B1 <> 0 t = b1 / (Se / Sqrt(Sxx)) = -10.887 Critical t at 95% and df=8 = 2.306 Since calculated value is in the rejection region we reject the null hypothesis. There is enough evidence to conclude that the age of corvettes is useful for predicting price of the corvettes. Confidence Intervals at 95% -27.9 + 2.306 * (14.247 / Sqrt(30.90)) = -21.99-27.9-2.306 * (14.247 / Sqrt(30.90)) = -33.81 CI = (-21.99, -33.81)