SIMPLE REGRESSION ANALYSIS. Business Statistics

Similar documents
MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

Inferences for Regression

Correlation Analysis

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Simple Linear Regression

Inference for Regression Simple Linear Regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Chapter 4. Regression Models. Learning Objectives

The Multiple Regression Model

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter 4: Regression Models

Ch 2: Simple Linear Regression

Business Statistics (BK/IBA) Tutorial 4 Full solutions

Basic Business Statistics 6 th Edition

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis II

Regression Models. Chapter 4. Introduction. Introduction. Introduction

STAT 350 Final (new Material) Review Problems Key Spring 2016

Variance Decomposition and Goodness of Fit

Chapter 14 Simple Linear Regression (A)

Statistics for Managers using Microsoft Excel 6 th Edition

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

Chapter 16. Simple Linear Regression and Correlation

Inference for Regression

Finding Relationships Among Variables

Chapter 7 Student Lecture Notes 7-1

Basic Business Statistics, 10/e

Inference for Regression Inference about the Regression Model and Using the Regression Line

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

ECON 497 Midterm Spring

Chapter 14 Student Lecture Notes 14-1

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

CS 5014: Research Methods in Computer Science

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Chapter 16. Simple Linear Regression and dcorrelation

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

ANOVA CIVL 7012/8012

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Simple Linear Regression: One Quantitative IV

Ordinary Least Squares Regression Explained: Vartanian

Inference for the Regression Coefficient

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

Chapter 3 Multiple Regression Complete Example

Mathematics for Economics MA course

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Lecture 10 Multiple Linear Regression

Applied Regression Analysis. Section 2: Multiple Linear Regression

df=degrees of freedom = n - 1

Ch. 1: Data and Distributions

Lecture 9: Linear Regression

Section 3: Simple Linear Regression

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Section 4: Multiple Linear Regression

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Correlation and the Analysis of Variance Approach to Simple Linear Regression

ECO220Y Simple Regression: Testing the Slope

Simple Linear Regression: One Qualitative IV

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Block 3. Introduction to Regression Analysis

1 Correlation and Inference from Regression

Econometrics. 4) Statistical inference

ANOVA: Analysis of Variation

Linear models and their mathematical foundations: Simple linear regression

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Ch 3: Multiple Linear Regression

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

THE PEARSON CORRELATION COEFFICIENT

Confidence Interval for the mean response

STA121: Applied Regression Analysis

Business Statistics. Lecture 10: Correlation and Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Formal Statement of Simple Linear Regression Model

A discussion on multiple regression models

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Difference in two or more average scores in different groups

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Chapter 10. Design of Experiments and Analysis of Variance

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

R 2 and F -Tests and ANOVA

Math 3330: Solution to midterm Exam

LI EAR REGRESSIO A D CORRELATIO

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

The simple linear regression model discussed in Chapter 13 was written as

Ordinary Least Squares Regression Explained: Vartanian

Regression Analysis. BUS 735: Business Decision Making and Research

Concordia University (5+5)Q 1.

What is a Hypothesis?

y response variable x 1, x 2,, x k -- a set of explanatory variables

Six Sigma Black Belt Study Guides

Simple Linear Regression

Multiple Linear Regression

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Business Statistics. Lecture 9: Simple Regression

Transcription:

SIMPLE REGRESSION ANALYSIS Business Statistics

CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients The ANOVA table Old exam question Further study

ORDINARY LEAST SQUARES Idea of curve fitting in a scatterplot linear fit: y = a + bx (x=floor area of house, y=price of house)

ORDINARY LEAST SQUARES You find the best line by minimizing the misfit (e i ) between observed value (y i ) and modelled/estimated value ( y i = ax i + b) e i = y i y i in fact by minimizing the sum of squares of misfit n σ i=1 e i 2 OLS regression The hat (^) is our symbol for the estimate

STATISTICAL FORMULATION OF THE REGRESSION MODEL Rephrasing the model y = a + bx as a statistical model Assumptions and notation we assume a linear relation of the form of the population regression model Y i = β 0 + β 1 X i + ε i or Y = β 0 + β 1 X + ε with β 0 is the intercept or constant β 1 the slope or slope coefficient We prefer to use β 0 instead of a for the constant, and β 1 instead of b for the slope random variable ε i is the error or residual, the unexplained part

STATISTICAL FORMULATION OF THE REGRESSION MODEL Estimation of the model coefficients we assume that ε i ~N 0, σ 2 based on sample of n paired data points x i, y i, i = 1,, n use OLS to estimate the best line through the estimated regression model Y = b 0 + b 1 X or y i = b 0 + b 1 x i the estimated coefficients (b 0 for β 0 and b 1 for β 1 ) and the estimated error (e i for ε i ) corresespond to y i = b 0 + b 1 x i + e i

STATISTICAL FORMULATION OF THE REGRESSION MODEL x i, y i Y = b 0 + b 1 X e i { x i, y i b 0 b 1

STATISTICAL FORMULATION OF THE REGRESSION MODEL So b 0 is the estimated value of β 0 the intercept or constant of the regression line b 1 is the estimated value of β 1 the slope or slope coefficient of the regression line e i is the estimated residual or error for observation i the misfit

EXERCISE 1 Look back at the house prices where we have a line found y = 264700 + 6152x a. Give the theoretical model b. Give the estimated model

ASSESSING THE REGRESSION MODEL OLS will always give an estimate for β 0 and β 1 the line of best fit But is best also good enough to make good predictions? can we do a statistical test on the quality of the model? We have minimized the sum of squares (SS) of the error n SSE = i=1 n e i 2 = i=1 We would like to compare this with: the total sum of squares SST the explained sum of squares SSR y i y i 2 R stands for regression

ASSESSING THE REGRESSION MODEL Total sum of squares: n SST = i=1 y i തy 2 So SST is the total variation around the mean തy

ASSESSING THE REGRESSION MODEL Regression sum of squares: n SSR = i=1 y i തy 2 So, the data has a total variability SST the regression model explains a variability SSR and the residual variability is SSE and SST = SSR + SSE Coefficient of determination ( R-square ): R 2 = SSR SST = 1 SSE SST So SSR is the variation around the mean തy that is explained by the model

ASSESSING THE REGRESSION MODEL R 2 is a measure of the usefulness of the model Properties 0 R 2 1 R 2 = 0 means the model doesn t explain anything R 2 = 1 means the model explains everything in between, the model explains R 2 100% of the variance of Y R 2 = 1 SSE SST

ASSESSING THE REGRESSION MODEL If R 2 > 0, the regression model explains something but in a random sample, R 2 may be non-zero due to chance when is R 2 is significantly different from 0? Finding a test statistic look at the variances associated with SSR and SSE so define the mean sums of squares (MS) (variances!) MST = SST n 1 ; MSR = SSR 1 use MSR MSE = SSR/1 SSE/ n 2 ; MSE = SSE n 2 as a ratio of two variances

ASSESSING THE REGRESSION MODEL Statistical test: H 0 : the independent variable (X) does not explain the variation in the dependent variable (Y) i.e., H 0 : β 1 = 0 versus H 1 : β 1 0 Sample statistic: F = MSR MSE ; reject for large values Under H 0 : F~F 1,n 2 ; assumptions: see model Compare F calc = MSR MSE with F crit = F 1,n 2;α or compute p-value as the probability of obtaining F calc or more extreme if H 0 is true

ASSESSING THE REGRESSION MODEL Using SPSS, three types of output Model summary R 2 Variance decomposition (ANOVA?) SSR, SSE, SST MSR, MSE F calc p-value Regression coefficients b 0 and b 1

ASSESSING THE REGRESSION MODEL The model is Y = β 0 + β 1 X + ε OLS extracts estimates from the data: b 0 and b 1 But how accurate are these estimates? We can also find the distribution of B 0 and B 1 So, we can find confidence intervals and perform hypothesis tests B 0 and B 1 are t-distributed: B 0 β 0 S B 0 ~t n 2 B 1 β 1 S B 1 ~t n 2

ASSESSING THE REGRESSION MODEL Mind the notation, like before: mean population value μ X sample estimate xҧ sampling distribution of random variable തX regression slope population value β 1 sample estimate b 1 sampling distribution of random variable B 1 When you re careless with this, it all gets mixed up in one big abracadabra trickery!

EXERCISE 2 a. Is the model significant? b. Has the model practical relevance?

TESTING THE REGRESSION COEFFICIENTS Testing β 0 is usually not interesting but testing β 1 is! in particular, the hypothesis β 1 = 0 is often interesting i.e., the hypothesis that there is no relation between X and Y or: that knowledge of X doesn t tell you anything about Y This test requires the standard deviation of B 1 it is calculated from the data; see computer output here s B1 = 347.578

TESTING THE REGRESSION COEFFICIENTS So: t calc = b 1 β 1 s B 1 = 6151.670 0 347.578 = 17.699 which has to be compared to t crit = ±t 0.025;69 reject H 0 : β 1 = 0, because t calc > t crit or with p-value: p = 0.000 0.05 and conclude that the slope differs significantly from zero post-hoc conclusion: it is larger than zero

TESTING THE REGRESSION COEFFICIENTS Testing the regression model on the basis of MSR MSE ~F 1,n 2 Testing the regression coefficient b 1 on the basis of B 1 0 S B 1 ~t n 2 The two approaches are equivalent they have the same null hypothesis: H 0 : β 1 = 0 they lead to the same conclusion (rejection or no rejection) they lead to the same p-value when we do multiple regression with several explanatory variables this is not the case! See later.

TESTING THE REGRESSION COEFFICIENTS We can also perform other tests than H 0 : β 1 = 0 Case 1: Different test values for β 1 for example H 0 : β 1 = 2 t calc = b 1 2 s B 1 not in SPSS, but easily calculated using s B1 Case 2: One sided tests for example H 0 : β 1 0 t calc as before, but now tested with different t crit not in SPSS, but also easily calculated using 2-sided p-value Case 3: combination of case 1 and case 2 for example H 0 : β 1 2 Try all! (see tutorials)

TESTING THE REGRESSION COEFFICIENTS Example of case 3: is there evidence that the price per square meter larger than 5500? H 0 : β 1 5500; H 1 : β 1 > 5500; α = 0.05 t calc = 6151.670 5500 = 1.875 > t crit 1.7 347.578 one-sided critical value, with α, not α/2 reject H 0 conclude that price per m 2 is significantly larger than 5500

TESTING THE REGRESSION COEFFICIENTS One may also test β 0 in exactly the same way however, this is hardly ever useful Overall significance of F-test only depends on B 1 S B1, not on B 0 S B 0 that is because the slope explains variation while the intercept is only a vertical shift

THE ANOVA TABLE One of the regression results is the ANOVA table ANOVA = analysis of variance Excel SPSS

THE ANOVA TABLE What was ANOVA? ANOVA: Y numerical; X categorical regression: Y numerical; X numerical So ANOVA is really different from regression why then an ANOVA table in regression? Because ANOVA and regression both decompose the total variance (SST) into an explained part SSA in ANOVA (factor A ); SSR in regression ( Regression ) an unexplained part SSW in ANOVA ( Within ), SSE in regression ( Error )

THE ANOVA TABLE The ANOVA table for regression MS = SS, with = R, E df F calc = MSR and associated p-value MSE

OLD EXAM QUESTION 21 May 2015, Q2c

FURTHER STUDY Doane & Seward 5/E 12.1-12.6 Tutorial exercises week 4 regression analysis