Regression Analysis II

Similar documents
LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Chapter 14 Simple Linear Regression (A)

Ordinary Least Squares Regression Explained: Vartanian

Correlation Analysis

Chapter 4: Regression Models

Finding Relationships Among Variables

The Multiple Regression Model

2 Regression Analysis

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Statistics for Managers using Microsoft Excel 6 th Edition

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Ch 13 & 14 - Regression Analysis

Inference for Regression

Simple Linear Regression

Inferences for Regression

The simple linear regression model discussed in Chapter 13 was written as

Chapter 4. Regression Models. Learning Objectives

Simple Linear Regression

BNAD 276 Lecture 10 Simple Linear Regression Model

LI EAR REGRESSIO A D CORRELATIO

SIMPLE REGRESSION ANALYSIS. Business Statistics

Variance Decomposition and Goodness of Fit

Chapter 16. Simple Linear Regression and Correlation

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Chapter 16. Simple Linear Regression and dcorrelation

Simple Linear Regression

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Ch 2: Simple Linear Regression

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Chapter 7 Student Lecture Notes 7-1

Mathematics for Economics MA course

Bivariate Relationships Between Variables

df=degrees of freedom = n - 1

Basic Business Statistics, 10/e

Ordinary Least Squares Regression Explained: Vartanian

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

ECON 450 Development Economics

Basic Business Statistics 6 th Edition

ST430 Exam 2 Solutions

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Use of Dummy (Indicator) Variables in Applied Econometrics

Midterm 2 - Solutions

Tests of Linear Restrictions

Coefficient of Determination

Chapter 3 Multiple Regression Complete Example

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

Linear Correlation and Regression Analysis

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Inference for the Regression Coefficient

16.3 One-Way ANOVA: The Procedure

STAT 3A03 Applied Regression With SAS Fall 2017

5. Multiple Regression (Regressioanalyysi) (Azcel Ch. 11, Milton/Arnold Ch. 12) The k-variable Multiple Regression Model

What is a Hypothesis?

Lecture 6 Multiple Linear Regression, cont.

STA121: Applied Regression Analysis

Inference for Regression Simple Linear Regression

Week 12 Hypothesis Testing, Part II Comparing Two Populations

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Bayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities

Linear models and their mathematical foundations: Simple linear regression

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

August 2018 ALGEBRA 1

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

A Plot of the Tracking Signals Calculated in Exhibit 3.9

Lecture 8 CORRELATION AND LINEAR REGRESSION

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Statistics and Quantitative Analysis U4320

Chapter 14. Linear least squares

Review of Statistics

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Chapter 14 Student Lecture Notes 14-1

Inference for Regression Inference about the Regression Model and Using the Regression Line

Concordia University (5+5)Q 1.

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

School of Mathematical Sciences. Question 1

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Semester 2, 2015/2016

Lecture 11: Simple Linear Regression

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Ch 3: Multiple Linear Regression

Budget Estimate

Regression Analysis. BUS 735: Business Decision Making and Research

MATH ASSIGNMENT 2: SOLUTIONS

Density Temp vs Ratio. temp

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

Hypothesis Testing hypothesis testing approach

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Transcription:

Regression Analysis II

Measures of Goodness of fit Two measures of Goodness of fit Measure of the absolute fit of the sample points to the sample regression line Standard error of the estimate An index of the relative goodness of fit of the sample regression line Coefficient of determination

Standard error of the estimate SEE = e i 2 n 2 e i 2 = (y i y) 2 where SEE denotes the standard error of estimate. SEE is the standard deviation of the errors about the sample regression line.

y Unexplained variation: y i -y c { Explained variation: y c - y{ y i y c Total deviation: y i y y = 3.10 Explained Unexplained Total deviation = Unexplained deviation + Explained deviation x (y i y) = (y i -y c ) + (y c - y )

Total variation = Unexplained variation + Explained variation (y i y) 2 = (y i y c ) 2 + (y c y ) 2 SST = SSE + SSR

Computational formula for correlation coefficient r r = n x i y i x i y i n x i 2 ( x i ) 2 n y i 2 ( y i ) 2

The Coefficient of Determination The relative amount of the variation that has been explained by the sample regression line. r 2 = SSR SST = VARIATION EXPLAINED TOTAL VARIATION = (y c y ) 2 (y i y ) 2 r 2 is the proportion of total variation that is explained by the regression line. If the regression line perfectly fit all the sample points, all residuals would be zero. Then SSE = 0, SSR = SST r 2 = SSR = 1.0 SST

y y y c = y x Perfect fit: r 2 = 1, y c = y, e i = 0 No systematic relation between y and x; r 2 = 0,y c = y, b = 0

Using the information given in previous example a) What is the degree of correlation r between fuel sales and temperature? b) Test the statistical significance of this value of r at the 5 per cent level of significance. c) What proportion of the variation in fuel oil sales is explained by variations in temperature?

a) The appropriate formula to compute r: r = n x i y i x i y i n x i 2 ( x i ) 2 n y i 2 ( y i ) 2 r = -0.927

X i = Sales amount of fuel Y i = temperature X i Y i X i Y i X 2 i Y 2 i 4 26 104 16 676 10 17 170 100 289 14 7 98 196 49 12 12 144 144 144 4 30 120 16 900 5 40 200 25 1600 8 20 160 64 400 n=10 x i y i = 1366 x i = 96 y i = 182 x i 2 = 1076 2 y 2 i = 4408 ( x i ) ( y i ) 2 = 182 2 = 33124 = 9216 11 15 165 121 225 13 10 130 169 100 r = n x i y i x i y i n x i 2 ( x i ) 2 n y i 2 ( yi ) 2 = 0.927 15 5 75 225 25

b) Test the statistical significance of this value of r at the 5 per cent level of significance. Hypotheses: H 0 : ρ = 0 H 1 : ρ 0 Significance level : α = 0.05 Standard error : s r = 1 r 2 n 2 = 1 ( 0.927)2 10 2 =0.1326 Test statistics: critical t= tα with n-2 df = -1.86 actual t = r ρ s r = 0.927 0 0.1326 = -6.99

Conclusion: t tα i.e. -6.99-1.86. We reject H 0 and conclude that ρ 0, i.e. that there is a significant negative correlation between fuel sales and temperature. Note that, in a sample regression involving only one dependent variable y and one explanatory variable x, a significance test on r is equivalent to a significance test on the regression coefficient b.

c) What proportion of the variation in fuel oil sales is explained by variations in temperature? The proportion of the variation in fuel oil sales is explained by variations in temperature is given by the coefficient of determination r 2. r 2 = ( 0.927) 2 = 0.859 That is, 85.9 per cent of the variation in fuel sales is explained by variations in temperature.

1. The Correlation Coefficient: A single summary number that tells you whether a relationship exists between two variables, how strong that relationship is and whether the relationship is positive or negative. 2. The Coefficient of Determination: A single summary number that tells you how much variation in one variable is directly related to variation in another variable. 3. Linear Regression: A process that allows you to make predictions about variable Y based on knowledge you have about variable X. 4. The Standard Error of Estimate: A single summary number that allows you to tell how accurate your predictions are likely to be when you perform Linear Regression.

Multiple Regression Analysis Simple Regression analysis: one independent variable (X) is used to predict the value of a dependent variable (Y) Multiple Regression Analysis: Several independent variables can be used to predict the value of a dependent variable. Multiple Regression is a statistical tool that allows you to examine how multiple independent variables are related to a dependent variable and the process is called Multiple Regression Analysis.

Multiple Regression Analysis Population multiple regression model y i = α + β 1 X i1 + β 2 X i2 + β 3 X i3 + + β m X im + Ɛ i Population multiple linear regression equation μ y.x1.x 2. X m = α + β 1 X 1 + β 2 X 2 + β 3 X 3 + + β m X m β 1, β 2,β 3,.. β m - Partial Regression coefficients

Sample multiple regression equation y = a + b 1 X i1 + b 2 X i2 + b 3 X i3 + + b m X im y = estimate of μ y.x1.x 2. X m a = estimate of the intercept α (Y intercept) b 1, b 2, b 3,..b m = estimates of the partial regression coefficients of β 1, β 2,β 3,.. β m b i = Slope of Y with variable X i holding other variables constant e. g. b 1 = Slope of Y with variable X i holding variables X 2, X 3 X m constant

Multiple regression model with two independent variables y i = α + β 1 X 1i + β 2 X 2i + Ɛ i α = Y intercept β 1 = slope of Y with variablex 1 holding variable X 2 constant β 2 = slope of Y with variablex 2 holding variable X 1 constant Ɛ i = random error in Y for observation i

Multiple regression Analysis Interpretation of the individual regression coefficients Statistical significance of the regression coefficients Overall explanatory power of the estimated equation Statistical significance of the overall explanatory power

Interpretation of the individual regression coefficients a (intercept term) = the estimated value of Y ( y ) when the values of all independent variables are zero. y = a when X 1 = X 2 = X m = 0 Interpretation of any b i coefficient b i represents the change in y corresponding to a unit change in x i, when all other independent variables are held constant.

Statistical significance of the regression coefficients Set up the null hypothesis which states that that variable associated with b 1 (X 1 ), does not influence the dependent variable. H o : B 1 = 0 H 1 : B 1 0 (a two tailed test) Or H o : B 1 = 0 H 1 : B 1 < 0 (a one tailed test) Or H o : B 1 = 0 H 1 : B 1 0 (a one tailed test) Hypothesis may be tested using t test.

Test statistic: t = b i B i S bi S bi = standard error of b i Under the null hypothesis t = b i 0 S bi = b i S bi Degrees of freedom : n k 1 n= number of observations k = number of independent variables

Overall explanatory power of the estimated regression equation Multiple coefficient of determination (R 2 ) R 2 = SSR SST = 1- SSE SST Coefficient of multiple correlation (R) R is a measure of the degree of freedom association between Y and all the explanatory variables jointly. Adjusted multiple coefficient of determination ( R 2 ) R 2 = 1 (1-R 2 ) n 1 n k 1

Statistical significance of overall explanatory power F statistic F = MSR MSE = SSR K SSE (n k 1) ; SST = SSR + SSE Sum of squares SSR k SSE n k - 1 SST n 1 Degrees of freedom

Example. The data below show the monthly sales of heating fuel for a firm over the past 12 months together with the average price charged per unit in each month, the advertising expenditure (Rs000s) per month and the mean daily temperature ( C) recorded during each month. Using these data, compute the regression equation which can be used to estimate the influence of the three explanatory variables (price, advertising, z) on heating fuel sales. Comments on the results.

Month Sales Price Advertising Temperature (liters) (Rs 00s per liter) expenditure(rs000s) ( C) January 450 0.60 25 27.50 February 380 1.20 17 25.00 March 298 1.80 14 29.00 April 350 1.50 18 30.00 May 201 3.00 10 31.00 June 215 2.70 11 35.00 July 220 2.70 12 32.00 August 240 2.10 11 30.00 September 192 3.00 7 29.00 October 201 2.40 7 24.00 November 202 2.40 8 22.00 December 235 2.30 10 20.00

Regression equation: Sales = a + b 1 (Price) + b 2 (Advertising expenditure) + b 3 (temperature) Regression equation: Sales = 281.357-53.344(Price) + 8.853(Advertising expenditure) -0.446(temperature) Interpretation ( from the computer output) 1. Size and sign of coefficients Sales to be inversely related to price. Sales to be positively related to advertising expenditure. Sales to rise as temperature falls.

Interpretation.. Regression equation: Sales = 281.357-53.344(Price) + 8.853(Adexp) -0.446(temp) Sales fall by 53.344 liters for a unit increase (Rs 100) in price, rise by 8.853 liters for a unit increase (Rs 1000) in advertising expenditure and fall by 0.446 liters for a unit rise (1 C) in temperature.

Statistical significance of the coefficients t ratio for price is -2.842 with a probability value.022 indicating that the coefficient b 1 is different from zero at the 0.05 level of significance. The coefficient on price is statistically significant. t ratio for advertising expenditure is 3.410 with a probability value.009 indicating that the coefficient b 2 is different from zero at the 0.05 level of significance. The coefficient on advertising expenditure is statistically significant. The coefficient on temperature with a t ratio of -.307and a probability value of.767, is not significant at the 5 per cent level.

Statistical significance of the regression has whole The F statistic( 131.704)has a probability value.000 indicating that the regression as a whole is very highly significant. Overall explanatory power The R 2 value indicates that 98 per cent of the variation in sales is explained by the regression as a whole (i.e. by the joint variation in the three independent variables).