Section 11: Quantitative analyses: Linear relationships among variables

Similar documents
Statistics for Managers using Microsoft Excel 6 th Edition

Six Sigma Black Belt Study Guides

Inferences for Regression

Correlation Analysis

Basic Business Statistics 6 th Edition

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

Homework 2: Simple Linear Regression

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Simple Linear Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Simple Linear Regression

Scatter plot of data from the study. Linear Regression

Correlation 1. December 4, HMS, 2017, v1.1

Business Statistics. Lecture 9: Simple Regression

A discussion on multiple regression models

Scatter plot of data from the study. Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Stat 101 L: Laboratory 5

1 Correlation and Inference from Regression

Inference for the Regression Coefficient

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Applied Regression Modeling

SIMPLE REGRESSION ANALYSIS. Business Statistics

Chapter 16. Simple Linear Regression and dcorrelation

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Section 3: Simple Linear Regression

Chapter 4: Regression Models

Econometrics. 4) Statistical inference

Simple Linear Regression

Correlation and Regression

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Stat 500 Midterm 2 8 November 2007 page 0 of 4

R 2 and F -Tests and ANOVA

Chapter 16. Simple Linear Regression and Correlation

Simple Linear Regression

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Introduction and Single Predictor Regression. Correlation

School of Mathematical Sciences. Question 1

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Comparing Nested Models

Linear models and their mathematical foundations: Simple linear regression

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Simple Linear Regression

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Mathematics for Economics MA course

Chapter 4. Regression Models. Learning Objectives

Regression With a Categorical Independent Variable

Simple Linear Regression Using Ordinary Least Squares

2 Regression Analysis

Correlation and Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

BNAD 276 Lecture 10 Simple Linear Regression Model

Ch 2: Simple Linear Regression

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Lecture 10: F -Tests, ANOVA and R 2

Data Analysis and Statistical Methods Statistics 651

Formal Statement of Simple Linear Regression Model

Data Analysis and Statistical Methods Statistics 651

STAT 350 Final (new Material) Review Problems Key Spring 2016

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Chapter 14. Linear least squares

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Can you tell the relationship between students SAT scores and their college grades?

CHAPTER EIGHT Linear Regression

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Inference with Simple Regression

Confidence Intervals, Testing and ANOVA Summary

Statistics and Quantitative Analysis U4320

CS 5014: Research Methods in Computer Science

One-way ANOVA (Single-Factor CRD)

Lecture 3: Inference in SLR

Remedial Measures, Brown-Forsythe test, F test

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Final Exam - Solutions

Math 423/533: The Main Theoretical Topics

Ordinary Least Squares Regression Explained: Vartanian

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

Chapter 27 Summary Inferences for Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression

STATISTICAL ANALYSIS WITH MISSING DATA

Correlation and Regression

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Lectures on Simple Linear Regression Stat 431, Summer 2012

Basic Business Statistics, 10/e

Regression With a Categorical Independent Variable

Note: k = the # of conditions n = # of data points in a condition N = total # of data points

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Transcription:

Section 11: Quantitative analyses: Linear relationships among variables Australian Catholic University 214 ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced or used in any form or by any means graphic, electronic, or mechanical, including photocopying, recording, taping, Web distribution, information storage and retrieval systems without the written permission of the publishers. Disclaimer: No person should rely on the contents of this publication without first obtaining advice from a qualified professional person. This publication is distributed on the terms and understanding that: (1) the authors, consultants and editors are not responsible for the results of any actions taken on the basis of information in this publication, nor for any error in or omission from this publication; and (2) the publisher is not engaged in legal, accounting, professional or other advice or services. The publisher, and the authors, consultants and editors, expressly disclaim all and any liability and responsibility to any person, whether a purchaser or a reader of this publication or not, in respect of anything, and of the consequences of anything, done or omitted to be done by any such person in reliance, whether wholly or partially, upon the whole or any part of the contents of this publication. Without limiting the generality of the above, no author, consultant or editor shall have any responsibility for any act or omission of any other author, consultant or editor. MGMT617 Research Methods Section 11 1

11.1 Linear relationship between two variables Prescribed reading 11.11 Zikmund, WG Babin, BJ Carr, JC & Griffin, M 213, Business research methods, 9 th edn, South- Western, Cengage Learning, Mason OH. Chapter 23. Recall the use of a scatter plot as a guide to linear relationship between two variables. Examples are as follows Example 1: No clear relationship 1 Scatter plot 8 6 4 2 5 1 Example 2: Approximate linear relationship 7 6 5 4 3 2 1 Scatter plot 5 1 MGMT617 Research Methods Section 11 2

Example 3: Non-linear relationship 8 Scatter plot 6 4 2 5 1 Correlation coefficients A linear relationship between two variables is assessed using correlation coefficients. If we have samples of n observations of variables X and Y, the sample correlation between them is calculated using the formula r xy = (x i - )(y i - )/(s x s y ) where the sum runs from 1 to n and s x and s y are the sample standard deviations for X and Y. This is only sample data, so we need to test whether it gives evidence for a nonzero population correlation. We test the null hypothesis H : ρ = against the alternative hypothesis Ha: ρ. We do a t-test, using the calculation t = r/ ((1-r 2 )/(n-2)). where t is distributed as T with degrees of freedom n-2. Simple linear regression If the scatter plot from two samples gives us reason to believe that there is a linear relationship between two variables, we can do a calculation that fits a straight line to the observations that is the best possible such line, in the sense that it minimises the sum of the squared deviations of the actual observation points in the scatter plot from the line. This is what Zikmund et al. (p. 571) call the Ordinary Least Squares (OLS) method. To use the method, we need first to know what its assumptions are. The basis assumption is that the two variables are related by a formula of the form. MGMT617 Research Methods Section 11 3

Y =β + β 1 X + error where at each point the errors are independent, with zero mean, and all their standard deviations equal. It is worth giving the formulae we use to produce the least squares regression line. The equation is: ŷ = b + b 1 x, where b 1 = r xy s y /s x, and b = b 1, in which, are the sample means, s x, s y the sample standard deviations, and the r xy the sample correlation between X and Y. Checking the assumptions The assumption that the observations are independent can be checked against sampling. A simple random sample, for example, would satisfy this. One should also check the deviations from the line for normality and equal variances. A rough check can be made using a plot, which should show no real pattern, and deviations of approximately the same size over the different x values, that is, not systematically larger or smaller over the x values. Testing for the significance of the linear relationship. We want to test H : β 1 = against the usual Ha: β 1. One usually does the calculation using a computer package. Excel can be used to get the regression line, and the output contains the information needed for the significance test. We look at the following example of output from Excel. Take the data set from Example 2 above, where the scatter plot indicated that there was a linear relationship. The output contains a first block, as follows. Regression Statistics Multiple R.774391 R Square.599681 Adjusted R Square.559649 Standard Error.88428 Observations 12 MGMT617 Research Methods Section 11 4

It contains Multiple R, which is actually the simple correlation coefficient between X and Y. The square of this is.599681, and this means about 59.97% of the variation in Y is explained by the variation in X. The next block is an analysis of variance table for the assessment of overall significance of the result. ANOVA df SS MS F Significance F Regression 1 11.71177 11.71177 14.987.317 Residual 1 7.818233.781823 Total 11 19.53 The value of F is highly significant. Information about the coefficients b and b 1 is in a section below, as follows Coefficients Standard Error t Stat P-value Intercept 1.55981.67335 2.313742.43231 X Variable 1.427125.11357 3.8749.317 This tells us that the regression line has equation = 1.55981 +.427125 x The coefficient of x is followed by the information from the t-test for the given hypothesis H : β 1 = versus Ha: β 1 We get t = 3.8749, and P(T>3.8749 or T<-3.8749) =.317. So the result is highly significant and we reject the null hypothesis. The Excel program can also give a guide to the rest of the assumptions. The residuals are the values y -, that is, the deviations of the y-value on the line from the observed value y. The program can plot these as follows. MGMT617 Research Methods Section 11 5

Residuals 1.5 1.5 -.5-1 -1.5 X Variable 1 Residual Plot 2 4 6 8 1 X Variable 1 These should be randomly positive or negative, and there should be no systematic variation in size as x increases. The plot is consistent with the requirement. The residuals are also required to be normally distributed. A normal quantile plot for them is copied below. Normal quantile plot 1.5 1.5-2 -1 -.5 1 2-1 -1.5 This is approximately a straight line. So overall one can conclude that the regression method could be validly used. 11.2 Multiple regression Prescribed reading 11.2 Zikmund, WG Babin, BJ Carr, JC & Griffin, M 213, Business research methods, 9 th edn, South- Western, Cengage Learning, Mason OH. Chapter 24, pp. 582-59. Multiple regression can also be done in Excel. MGMT617 Research Methods Section 11 6

Assume that there are several independent variables x 1, x 2, x n, each of which makes a linear contribution to the dependent variable y. So we fit an equation of the following form: ŷ = b + b 1 x 1 + b 2 x 2 +... + b n x n. The assumptions are similar to those for simple regression. That is, the relation with each xi is assumed to be linear, and residuals are assumed independent, normally distributed, with equal variances. The calculation is also the same, using the least squares criterion to give the coefficients. In general, there is a choice of methods of doing multiple regression, but here one just needs to be aware of the fact that the contribution of each variable is assessed in the context of contributions from the other variables. What this means is that related independent variables may not contribute together much more than one of them does. Example The variables are X1, X2, Y, on 25 observations. We look first at ht correlations among the three variables. These are in the table below X2 Y X1.72.56 X2.67 Note that X 1 and X 2 correlate quite strongly with each other, as well as with Y. We now do multiple regression of Y on the X i. Output is below. Regression Statistics Multiple R.682181 R Square.465371 Adjusted R Square.416769 Standard Error 1.329628 Observations 25 MGMT617 Research Methods Section 11 7

ANOVA df SS MS F Significance F Regression 2 33.85558 16.92779 9.57529.12 Residual 22 38.8942 1.76791 Total 24 72.7496 Coefficients Standard Error t Stat P-value Intercept 2.233293654.694954746 3.213581.42 X Variable 1.19961128.14746839.747797.46252 X Variable 2.4219633.17718337 2.471359.21679 The analysis of variance table gives the significance of the whole regression, which is very high. If we look at the table with coefficients, we see that the coefficient for X 2 is significantly different from zero, but the coefficient for X 1 is not. This means that, in the presence of X 2, the variable X 1 does not contribute significantly more. Other methods In Chapter 24, Zikmund et al. describe some other methods that can be used with multiple variables, but for these one would need a package designed primarily for statistics. MGMT617 Research Methods Section 11 8