Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Similar documents
Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Chapter 16. Simple Linear Regression and Correlation

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Chapter 16. Simple Linear Regression and dcorrelation

Topic 10 - Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Section 3: Simple Linear Regression

9. Linear Regression and Correlation

Basic Business Statistics 6 th Edition

Correlation Analysis

Review of Statistics 101

Variance Decomposition and Goodness of Fit

STAT Chapter 11: Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 9: Simple Regression

STAT 3A03 Applied Regression With SAS Fall 2017

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Simple linear regression

Chapter 4: Regression Models

Chapter 13. Multiple Regression and Model Building

Mathematics for Economics MA course

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Lecture 18: Simple Linear Regression

Ch 13 & 14 - Regression Analysis

Statistical View of Least Squares

Simple Linear Regression

Correlation & Simple Regression

Simple Linear Regression

Chapter 14 Simple Linear Regression (A)

INFERENCE FOR REGRESSION

Inference for Regression

Multiple Linear Regression. Chapter 12

Chapter 1 Linear Regression with One Predictor

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

BNAD 276 Lecture 10 Simple Linear Regression Model

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Hypothesis Testing hypothesis testing approach

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Lecture 11: Simple Linear Regression

Warm-up Using the given data Create a scatterplot Find the regression line

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Chapter 12 - Part I: Correlation Analysis

Linear Regression and Correlation

The Simple Linear Regression Model

Chapter 12: Linear regression II

Correlation and Regression

The simple linear regression model discussed in Chapter 13 was written as

Statistics for Engineers Lecture 9 Linear Regression

Lecture 3: Inference in SLR

L21: Chapter 12: Linear regression

ST430 Exam 2 Solutions

Chapter 27 Summary Inferences for Regression

Basic Business Statistics, 10/e

2. Linear regression with multiple regressors

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Applied Regression Analysis

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Chapter 3 Statistical Estimation of The Regression Function

2 Regression Analysis

Regression and the 2-Sample t

STATS DOESN T SUCK! ~ CHAPTER 16

Applied Regression Modeling

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Density Temp vs Ratio. temp

Lecture notes on Regression & SAS example demonstration

Regression Models - Introduction

AMS 7 Correlation and Regression Lecture 8

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

y response variable x 1, x 2,, x k -- a set of explanatory variables

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Ordinary Least Squares Regression Explained: Vartanian

Data Analysis and Statistical Methods Statistics 651

Comparing Nested Models

STK4900/ Lecture 3. Program

Regression Analysis. BUS 735: Business Decision Making and Research

Ch 2: Simple Linear Regression

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis II

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Statistics for Managers using Microsoft Excel 6 th Edition

appstats27.notebook April 06, 2017

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Inference for Regression Simple Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Inferences for Regression

df=degrees of freedom = n - 1

Regression. Marc H. Mehlman University of New Haven

What is a Hypothesis?

CHAPTER EIGHT Linear Regression

Simple Linear Regression

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Nonlinear Regression

ECON 450 Development Economics

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Transcription:

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections 2.1 2.3 by Iain Pardoe 2.1 Probability model for and 2 Simple linear regression model for and.................................... 2 Possible relationships between and....................................... 3 Straight-line model.................................................... 4 HOMES2 data....................................................... 5 Simple linear regression model equation...................................... 6 2.2 Least squares criterion 7 Least squares criterion.................................................. 7 Estimating the model................................................... 8 Estimated equation.................................................... 9 Computer output...................................................... 10 2.3 Model evaluation 11 Model evaluation...................................................... 11 Evaluating fit numerically................................................ 12 Regression standard error, s.............................................. 13 Regression standard error interpretation...................................... 14 Coefficient of determination, R 2............................................ 15 Calculating R 2........................................................ 16 Interpreting R 2....................................................... 17 R 2 examples......................................................... 18 Correlation.......................................................... 19 Correlation examples................................................... 20 Slope parameter, b 1.................................................... 21 Hypothesis test for b 1................................................... 22 Computer output and slope test illustration.................................... 23 Slope confidence interval................................................ 24 1

2.1 Probability model for and 2 / 24 Simple linear regression model for and is a quantitative response variable (a.k.a. dependent, outcome, or output variable). is a quantitative predictor variable (a.k.a. independent or input variable, or covariate). Two variables play different roles, so important to identify which is which and define carefully, e.g.: is sale price, in $ thousands; is floor size, in thousands of square feet. How much do we expect to change by when we change the value of? What do we expect the value of to be when we set the value of at 2? Note: association (observational data) not causation (experimental data). c Iain Pardoe, 2006 2 / 24 Possible relationships between and Which factors might lead to the different relationships? sale price sale price floor size floor size sale price sale price floor size floor size c Iain Pardoe, 2006 3 / 24 2

Straight-line model Simple linear regression models straight-line relationships (like upper two plots on last slide). Suppose sale price is (on average) $190,300 plus 40.8 times floor size. E( i ) = 190.3 + 40.8 i, where E( i ) means the expected value of given that is equal to i. Individual sale prices can deviate from this expected value by an amount e i (called a random error ). i i = 190.3 + 40.8 i + e i (i = 1,...,n). i i = deterministic part + random error. Error, e i, represents variation in due to factors other than which we haven t measured, e.g., lot size, # beds/baths, age, garage, schools. c Iain Pardoe, 2006 4 / 24 HOMES2 data 259.9 259.9 269.9 270.0 285.0 1.683 1.708 1.922 2.053 2.269 = sale price (in thousands of dollars) 260 265 270 275 280 285 E( ) = 190.3 + 40.8 e 1.7 1.8 1.9 2.0 2.1 2.2 2.3 = floor size (in thousands of square feet) c Iain Pardoe, 2006 5 / 24 3

Simple linear regression model equation Population: E( ) = b 0 + b 1. = response 0 1 2 3 4 5 b 0 = intercept E( ) = b 0 + b 1 b 1 = slope change in change in 0 2 4 6 = predictor c Iain Pardoe, 2006 6 / 24 2.2 Least squares criterion 7 / 24 Least squares criterion Which line fits the data best? = sale price 260 265 270 275 280 285 1.7 1.8 1.9 2.0 2.1 2.2 2.3 = floor size c Iain Pardoe, 2006 7 / 24 4

Estimating the model Population: E( ) = b 0 + b 1. Sample: Ŷ = ˆb 0 + ˆb 1 (estimated model). Obtain ˆb 0 and ˆb 1 by finding best fit line (least squares line). Mathematically, minimize sum of squared errors: n SSE = = = i=1 ê 2 i n ( i Ŷi) 2 i=1 n ( i ˆb 0 ˆb 1 i ) 2. i=1 Can use calculus (partial derivatives), but we ll use computer software to find ˆb 0 and ˆb 1. c Iain Pardoe, 2006 8 / 24 Estimated equation Sample: Ŷ = ˆb 0 + ˆb 1. = sale price 260 265 270 275 280 285 ^ = b^0 + b^1 (fitted value) e^ (estimated error) = b^0 + b^1 + e^ (observed value) 1.7 1.8 1.9 2.0 2.1 2.2 2.3 = floor size c Iain Pardoe, 2006 9 / 24 5

Computer output Parameters a Model Estimate Std. Error t-stat Pr(> t ) 1 (Intercept) 190.318 11.023 17.266 0.000 40.800 5.684 7.179 0.006 a Response variable:. Estimated equation: Ŷ = ˆb 0 + ˆb 1 = 190.3 + 40.8. We expect =ˆb 0 when =0, but only if this makes sense and we have data close to =0 (not the case here). We expect to change by ˆb 1 when increases by one unit, i.e., we expect sale price to increase by $40,800 when floor size increases by 1000 sq. feet. For this example, more meaningful to say we expect sale price to increase by $4080 when floor size increases by 100 sq. feet. c Iain Pardoe, 2006 10 / 24 2.3 Model evaluation 11 / 24 Model evaluation How well does the model fit each dataset? c Iain Pardoe, 2006 11 / 24 6

Evaluating fit numerically Three methods: How close are the actual observed -values to the model-based fitted values, Ŷ? Calculate the regression standard error, s. How much of the variability in have we been able to explain with our model? Calculate the coefficient of determination, R 2. How strong is the evidence of a straight-line relationship between and? Estimate and test the slope parameter, b 1. c Iain Pardoe, 2006 12 / 24 Regression standard error, s Model Summary Adjusted Regression Model Multiple R R Squared R Squared Std. Error 1 0.972 a 0.945 0.927 2.7865 a Predictors: (Intercept),. Regression standard error, s, estimates the std. dev. of the simple linear regression random errors: SSE s = n 2. Unit of measurement for s is the same as unit of measurement for. Approximately 95% of the observed -values lie within plus or minus 2s of their fitted Ŷ-values. Since 2s=5.57, we can expect to predict an unobserved sale price from a particular floor size to within approx. ±$5570 (at a 95% confidence level). c Iain Pardoe, 2006 13 / 24 7

Regression standard error interpretation CLT: 95% of -values lie within ±2s of regression line. + 2s 2s c Iain Pardoe, 2006 14 / 24 Coefficient of determination, R 2 Measures of variation for simple linear regression. = sale price 260 265 270 275 280 285 n SSE = (i ^ i) 2 i=1 n TSS = (i m ) 2 i=1 1.7 1.8 1.9 2.0 2.1 2.2 2.3 = floor size c Iain Pardoe, 2006 15 / 24 8

Calculating R 2 Without model, estimate with sample mean m. With model, estimate using fitted Ŷ-value. How much do we reduce our error when we do this? Total error without model: TSS = n i=1 ( i m ) 2, variation in about m. Remaining error with model: SSE = n i=1 ( i Ŷi) 2, unexplained variation. Proportional reduction in error: R 2 = TSS SSE TSS. Home prices example: R 2 = 423.4 23.3 423.4 = 0.945. 94.5% of the variation in sale price (about its mean) can be explained by a straight-line relationship between sale price and floor size. c Iain Pardoe, 2006 16 / 24 Interpreting R 2 Model Summary Adjusted Regression Model Multiple R R Squared R Squared Std. Error 1 0.972 a 0.945 0.927 2.7865 a Predictors: (Intercept),. R 2 measures the proportion of variation in (about its mean) that can be explained by a straight-line relationship between and. If TSS = SSE then R 2 = 0: using to predict hasn t helped and we might as well use m to predict regardless of the value of. If SSE = 0 then R 2 = 1: using allows us to predict perfectly (with no random errors). Such extremes rarely occur and usually R 2 lies between zero and one, with higher values of R 2 corresponding to better fitting models. c Iain Pardoe, 2006 17 / 24 9

R 2 examples R 2 = 0.05 R 2 = 0.13 R 2 = 0.3 R 2 = 0.62 R 2 = 0.74 R 2 = 0.84 R 2 = 0.94 R 2 = 0.97 R 2 = 1 c Iain Pardoe, 2006 18 / 24 Correlation Model Summary Adjusted Regression Model Multiple R R Squared R Squared Std. Error 1 0.972 a 0.945 0.927 2.7865 a Predictors: (Intercept),. Correlation coefficient, r, measures the strength and direction of linear association between and : r 1 indicates a negative linear relationship; r +1 indicates a positive linear relationship; r 0 indicates no linear relationship. Simple linear regression: R 2 = absolute value of r ( multiple R above). But, correlation is less useful than R 2 in multiple linear regression. c Iain Pardoe, 2006 19 / 24 10

Correlation examples r = 1, R 2 = 1 r = 0.97, R 2 = 0.94 r = 0.81, R 2 = 0.66 r = 0.57, R 2 = 0.32 r = 0.01, R 2 = 0 r = 0.62, R 2 = 0.38 r = 0.87, R 2 = 0.76 r = 0.97, R 2 = 0.95 r = 1, R 2 = 1 c Iain Pardoe, 2006 20 / 24 Slope parameter, b 1 Infer from sample slope about the population slope. E() = b 0 + b 1 (expected value) e (random error) = b 0 + b 1 + e (population value) c Iain Pardoe, 2006 21 / 24 11

Hypothesis test for b 1 Recall univariate t-statistic = m E( ) s / n Here, slope t-statistic = ˆb 1 b 1 sˆb1 t n 2. NH: b 1 =0 versus AH: b 1 0. t n 1. t-statistic = ˆb 1 b 1 sˆb1 = 40.8 0 5.684 = 7.18. Significance level = 5%. Critical value is 3.182 (97.5 th percentile of t 3 ). Since t-statistic (7.18) is between 5.841 and 10.215, p-value is between 0.01 and 0.002. Since t-statistic (7.18) > critical value (3.182) and p-value < signif. level, reject NH in favor of AH. In other words, the sample data favor a nonzero slope (at a significance level of 5%). Exercise: do an upper tail test for this example. c Iain Pardoe, 2006 22 / 24 Computer output and slope test illustration Parameters a Model Estimate Std. Error t-stat Pr(> t ) 1 (Intercept) 190.318 11.023 17.266 0.000 40.800 5.684 7.179 0.006 a Response variable:. c Iain Pardoe, 2006 23 / 24 12

Slope confidence interval Calculate a 95% confidence interval for b 1. 97.5 th percentile of t 3 is 3.182. ˆb1 ± 97.5 th percentile (sˆb1 ) = 40.8 ± 3.182 5.684 = 40.8 ± 18.1 = (22.7, 58.9). Loosely speaking: based on this dataset, we are 95% confident that the population slope, b 1, is between 22.7 and 58.9. More precisely: if we were to take a large number of random samples of size 5 from our population of homes and calculate a 95% confidence interval for each, then 95% of those confidence intervals would contain the (unknown) population slope. Exercise: calculate a 90% confidence interval for b 1. c Iain Pardoe, 2006 24 / 24 13