Business Statistics. Lecture 10: Correlation and Linear Regression

Similar documents
Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Correlation Analysis

Basic Business Statistics 6 th Edition

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

REVIEW 8/2/2017 陈芳华东师大英语系

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Statistics for Managers using Microsoft Excel 6 th Edition

THE PEARSON CORRELATION COEFFICIENT

Bivariate Relationships Between Variables

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Can you tell the relationship between students SAT scores and their college grades?

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Chapter 4 Describing the Relation between Two Variables

AMS 7 Correlation and Regression Lecture 8

Correlation and Linear Regression

Business Statistics. Lecture 9: Simple Regression

Topic 10 - Linear Regression

Chapter 4. Regression Models. Learning Objectives

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Intro to Linear Regression

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

What is a Hypothesis?

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Scatter plot of data from the study. Linear Regression

Important note: Transcripts are not substitutes for textbook assignments. 1

Chapter 12 - Part I: Correlation Analysis

Inferences for Regression

Scatter plot of data from the study. Linear Regression

Regression Models. Chapter 4

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Chapter 16. Simple Linear Regression and dcorrelation

Inference for Regression Simple Linear Regression

Chapter 4: Regression Models

+ Statistical Methods in

Chapter 6: Exploring Data: Relationships Lesson Plan

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Section 3: Simple Linear Regression

Intro to Linear Regression

Simple Linear Regression

Finding Relationships Among Variables

Regression - Modeling a response

Inference for Regression

CRP 272 Introduction To Regression Analysis

Business Statistics. Lecture 10: Course Review

Correlation and simple linear regression S5

Chapter 7 Student Lecture Notes 7-1

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 14 Simple Linear Regression (A)

8/28/2017. Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables (X and Y)

Simple Linear Regression Using Ordinary Least Squares

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

1 Correlation and Inference from Regression

Ch 2: Simple Linear Regression

Chapter 16. Simple Linear Regression and Correlation

Inference for Regression Inference about the Regression Model and Using the Regression Line

Correlation and the Analysis of Variance Approach to Simple Linear Regression

y response variable x 1, x 2,, x k -- a set of explanatory variables

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

About Bivariate Correlations and Linear Regression

BNAD 276 Lecture 10 Simple Linear Regression Model

Relationships between variables. Association Examples: Smoking is associated with heart disease. Weight is associated with height.

Ordinary Least Squares Regression Explained: Vartanian

A discussion on multiple regression models

Review of Statistics 101

Chapter 7: Correlation and regression

Single and multiple linear regression analysis

Statistical View of Least Squares

Basic Business Statistics, 10/e

Hypothesis Testing hypothesis testing approach

Ordinary Least Squares Regression Explained: Vartanian

Psychology 282 Lecture #3 Outline

Statistics Introductory Correlation

Ch Inference for Linear Regression

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

STAT Chapter 11: Regression

Correlation: Relationships between Variables

Correlation and Regression

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Correlation and Regression

Chapter 9. Correlation and Regression

Lecture 11: Simple Linear Regression

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Lecture 5: Clustering, Linear Regression

Lecture 3: Inference in SLR

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Introduction and Single Predictor Regression. Correlation

HUDM4122 Probability and Statistical Inference. February 2, 2015

Chapter 3 Multiple Regression Complete Example

Chapter 12 : Linear Correlation and Linear Regression

Reminder: Student Instructional Rating Surveys

BIVARIATE DATA data for two variables

Lecture 5: Clustering, Linear Regression

The simple linear regression model discussed in Chapter 13 was written as

Lecture 5: Clustering, Linear Regression

Transcription:

Business Statistics Lecture 10: Correlation and Linear Regression

Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form (linear or nonlinear?) Direction (positive or negative?) Strength (no, weak, or strong?) of the relationship.

Scatterplot Linear Nonlinear Statistics for Managers 4 th Edition, Prentice-Hall 2004

Negative Positive Scatterplot Strong Weak Statistics for Managers 4 th Edition, Prentice-Hall 2004

Scatterplot No relationship Statistics for Managers 4 th Edition, Prentice-Hall 2004

Scatterplot A scatterplot is a graphical display of the relationship between two quantitative variables. The relationship examined by our eyes may not be satisfactory in many cases. We need a numerical measure to supplement the graph. Correlation is the measure we use.

Correlation The correlation measures the direction and strength of the linear relationship between two quantitative variables. sample correlation coefficient (r) population correlation coefficient ρ (rho)

Correlation: Concept & Computation Pearson correlation coefficient is standardized covariance: Covariance: Cov N i1 (, ) ( i )( N 1 i ) Covariance indicates the degree to which and vary together.

Interpreting Covariance Covariance between two variables: Cov(,) > 0 : and tend to move in the same direction. Cov(,) < 0: and tend to move in opposite directions. Cov(,) = 0: and are independent.

Correlation -1 r +1 Unit free The closer to 1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any linear relationship It is NOT an indicator of causal relationship between variables!

Scatterplots with Correlation Coefficients r = -1 r = -.6 r = 0 r = +1 r = +.3 r = 0 Statistics for Managers 4 th Edition, Prentice-Hall 2004

Causation? A strong correlation between two variables does not mean that changes in one variable cause changes in the other. The only legitimate way to try to establish a causal connection statistically is through the use of designed experiments.

Linear Regression Correlation treats two variables and as equals. It shows a symmetric linear relationship of them In many cases, we want to study an asymmetric linear relationship between and. One variable () influences (or predicts) the other variable (). = IV or predictor variable = DV or outcome variable

Linear Regression Analysis Describes how the DV () changes as a single independent variable () changes (the effect of on ) It is called simple linear regression analysis summarizes the relationship between two variables if the form of the relationship is linear. is often used as a mathematical model to predict the value of DV () based on a value of an IV (). Linear regression model

What is Linear? Linear Nonlinear Statistics for Managers 4 th Edition, Prentice-Hall 2004

What is Fitting a line to data? When a scatterplot displays a linear pattern, we can describe the overall pattern by drawing a straight line through the points. The equation of a line fitted to data gives a compact description of the dependency of the DV on the IV. It is a mathematical model for the straight-line relationship.

What is the equation of line? A straight line relating to has an equation of the form: = a + bx a = intercept b= slope a b

Simple Regression When we have a scatterplot with a linear relationship between the DV () and a single IV (), we are often interested in summarizing the overall pattern. We can do this by drawing a line on the graph. This type of line is called a regression line. A regression line is a straight line that describes how changes as changes.

Simple Regression model Finding a regression line which explains well the relationship between two variables. a = Intercept ˆ a b Mean value of DV when IV is zero b = Slope Amount by which DV changes on average when IV changes by one unit

Simple regression model PS 81.54 1.22income 220 200 180 160 Purchase Spending 140 120 100 80 20 30 40 50 60 70 80 90 Income

How to determine the best regression line? The best regression line is the one that comes the closest to the data points in the vertical direction. There are many ways to make this distance as small as possible. The method of least squares the most common method. The least-squares regression line of on is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.

Concept of least squares vertical distance between a data point and a line (residual) = ( i Ŷ i ) i Ŷ i Sum of the squares of all vertical distances i ˆ = SS(Error) = i N i1 2

Least squares method: Slope In the least-squares method, the slope b is calculated by: degree to which and vary together (covariability) b N i1 N i i1 i degree to which varies separately (separate variability) i 2

In the least-squares method, the slope b is calculated by: : one unit change how many changes in? N i i N i i i b 1 2 1 Least squares method: Slope

Least squares method: Intercept Once b is estimated, the intercept a is calculated by: a b This is derived from that i = a + b i (i = 1,,N)

Statistical test for Slope We can test the strength of the relationship between the two variables in simple linear regression. H o : β = 0 (β = population slope) There is no linear relationship between and H 1 : β 0 There is a linear relationship. What can we use for testing this? Again, t-test for a single parameter!

Statistical test for Slope Still remember this? t s M M We can easily change this formula for slope: sample slope t b s b population slope (= 0) sample standard dev. of β (standard error of the regression slope)

Statistical test for Slope Here, the sample standard deviation s b is given by: where N i i b s s 1 2 ) ( 2 ) ˆ ( 1 2 N s N i i i

Inferences about the Slope: t Test Example H 0 : β 1 = 0 H A : β 1 0 d.f. = 10-2 = 8 Test Statistic: t = 3.329 sb 1 Coefficients Standard Error t Stat P-value Intercept 98.24833 58.03348 1.69296 0.12892 Square Feet 0.10977 0.03297 3.32938 0.01039 Decision: b 1 t a/2=.02 5 Reject H -t 0 Do not reject H 0 α/2 t α/2. 0 a/2=.025 Reject H 0-2.3060 2.3060 3.329 Reject H 0 Conclusion: There is sufficient evidence that square footage affects house price 29

Statistical test for Slope If the observed value of t is greater than a critical value of t with DF = N-2 and α =.05, we may reject null hypothesis. This indicates that there is a significant linear relationship between and.

Assessing the goodness-of-fit of the regression model: Coefficient of Determination (R 2 ) Proportion of the total variation in accounted for by the regression model. R 2 SS(Regress SS(T) ion) Ranges from 0 to 1. The larger R 2, the more variance of DV explained 0 = No explanation at all. 1 = Perfect explanation. In simple regression, r = R 2

Examples of R 2 Values y R 2 = 1 y R 2 = 1 x Perfect linear relationship between x and y: 100% of the variation in y is explained by variation in x R 2 = 1 x 32

Examples of R 2 Values y 0 < R 2 < 1 y x Weaker linear relationship between x and y: Some but not all of the variation in y is explained by variation in x x 33

Examples of R 2 Values y R 2 = 0 No linear relationship between x and y: R 2 = 0 x The value of does not depend on x. (None of the variation in y is explained by variation in x) 34