Lecture 8 CORRELATION AND LINEAR REGRESSION

Similar documents
Time Series and Forecasting

Midterm 2 - Solutions

ACE2013: Statistics for Marketing and Management

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions

Regression Analysis II

A Plot of the Tracking Signals Calculated in Exhibit 3.9

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Week Topics of study Home/Independent Learning Assessment (If in addition to homework) 7 th September 2015

6-1 Slope. Objectives 1. find the slope of a line 2. use rate of change to solve problems

Chapter 1 0+7= 1+6= 2+5= 3+4= 4+3= 5+2= 6+1= 7+0= How would you write five plus two equals seven?

Chapter 12 - Part I: Correlation Analysis

Section Linear Correlation and Regression. Copyright 2013, 2010, 2007, Pearson, Education, Inc.

Product and Inventory Management (35E00300) Forecasting Models Trend analysis

Chapter 12 - Lecture 2 Inferences about regression coefficient

Announcements. J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 8, / 45

CORRELATION AND REGRESSION

Chapter 6: Exploring Data: Relationships Lesson Plan

13.7 ANOTHER TEST FOR TREND: KENDALL S TAU

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

MATH 1150 Chapter 2 Notation and Terminology

Correlation and Regression

BIOSTATISTICS NURS 3324

Monday Tuesday Wednesday Thursday Friday Professional Learning. Simplify and Evaluate (0-3, 0-2)

BNAD 276 Lecture 10 Simple Linear Regression Model

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Predicates and Quantifiers

Lecture # 31. Questions of Marks 3. Question: Solution:

Scatter plot of data from the study. Linear Regression

Measuring the fit of the model - SSR

Study Unit 2 : Linear functions Chapter 2 : Sections and 2.6

August 2018 ALGEBRA 1

Principles of Mathematics 12: Explained!

Correlation: basic properties.

Approximate Linear Relationships

Name: JMJ April 10, 2017 Trigonometry A2 Trimester 2 Exam 8:40 AM 10:10 AM Mr. Casalinuovo

Least Squares Regression

Simple Linear Regression

Scatter plot of data from the study. Linear Regression

Chapter 4 Describing the Relation between Two Variables

Industrial Engineering Prof. Inderdeep Singh Department of Mechanical & Industrial Engineering Indian Institute of Technology, Roorkee

Approximations - the method of least squares (1)

CURRICULUM MAP. Course/Subject: Honors Math I Grade: 10 Teacher: Davis. Month: September (19 instructional days)

MATH11400 Statistics Homepage

Scatterplots and Correlations

Correlation & Simple Regression

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

Data Analysis and Statistical Methods Statistics 651

appstats8.notebook October 11, 2016

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

Chapter 7. Scatterplots, Association, and Correlation

Final Exam - Solutions

Predicted Y Scores. The symbol stands for a predicted Y score

STAT 111 Recitation 7

MATH 2560 C F03 Elementary Statistics I LECTURE 9: Least-Squares Regression Line and Equation

STAT5044: Regression and Anova. Inyoung Kim

Mathematics for Economics MA course

Honors Algebra 1 - Fall Final Review

Mathematics Revision Guide. Algebra. Grade C B

Chapter 10 Correlation and Regression

Linear Programming II NOT EXAMINED

Simple Linear Regression

Chapter 3: Examining Relationships

Learning Goals. 2. To be able to distinguish between a dependent and independent variable.

De-mystifying random effects models

ECON3150/4150 Spring 2016

CHS Algebra 1 Calendar of Assignments August Assignment 1.4A Worksheet 1.4A

Analysing data: regression and correlation S6 and S7

SF2930: REGRESION ANALYSIS LECTURE 1 SIMPLE LINEAR REGRESSION.

Bivariate data data from two variables e.g. Maths test results and English test results. Interpolate estimate a value between two known values.

+ Statistical Methods in

Mathematics: Pre-Algebra Eight

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.

HADDONFIELD PUBLIC SCHOOLS Curriculum Map for Algebra II

Algebra I Keystone Quiz Linear Inequalities - (A ) Systems Of Inequalities, (A ) Interpret Solutions To Inequality Systems

6 th grade 7 th grade Course 2 7 th grade - Accelerated 8 th grade Pre-Algebra 8 th grade Algebra

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13

Creating a Travel Brochure

Statistical View of Least Squares

Inference for Regression

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Objectives. 2.1 Scatterplots. Scatterplots Explanatory and response variables. Interpreting scatterplots Outliers

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

Section 3: Simple Linear Regression

Unit 1 - Daily Topical Map August

Operation and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Time series and Forecasting

Chapter 5 Friday, May 21st

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Dependence and scatter-plots. MVE-495: Lecture 4 Correlation and Regression

Colorado State University, Fort Collins, CO Weather Station Monthly Summary Report

Unit #2: Linear and Exponential Functions Lesson #13: Linear & Exponential Regression, Correlation, & Causation. Day #1

Ordinary Least Squares Regression Explained: Vartanian

Graphing Sea Ice Extent in the Arctic and Antarctic

TMA4255 Applied Statistics V2016 (5)

Lecture 11: Simple Linear Regression

ST430 Exam 1 with Answers

Label the lines below with S for the same meanings or D for different meanings.

Transcription:

Announcements CBA5 open in exam mode - deadline midnight Friday! Question 2 on this week s exercises is a prize question. The first good attempt handed in to me by 12 midday this Friday will merit a prize...

Assignment 2 The Minitab-based Assignment 2 which would have been set if there had been no disruption has been scrapped, and the material on Minitab is non-examinable. A revised assignment 2 is due to be set. Proposed deadline is Friday May 4th, 1pm, subject to confirmation (it won t be before that!)

Lecture 8 CORRELATION AND LINEAR REGRESSION

Introduction In this lecture we look at relationships between pairs of continuous variables. These might be

Introduction In this lecture we look at relationships between pairs of continuous variables. These might be height and weight

Introduction In this lecture we look at relationships between pairs of continuous variables. These might be height and weight market value and number of transactions

Introduction In this lecture we look at relationships between pairs of continuous variables. These might be height and weight market value and number of transactions temperatures and sales of ice cream

Introduction In this lecture we look at relationships between pairs of continuous variables. These might be height and weight market value and number of transactions temperatures and sales of ice cream We can examine the relationship between our two variables through a scatter diagram (plot one against the other).

Introduction In this lecture we look at relationships between pairs of continuous variables. These might be height and weight market value and number of transactions temperatures and sales of ice cream We can examine the relationship between our two variables through a scatter diagram (plot one against the other). Today we will look at how to examine such relationships more formally.

Example: ice cream sales The following data are monthly ice cream sales at Luigi Minchella s ice cream parlour.

Month Average Temp ( o C) Sales ( 000 s) January 4 73 February 4 57 March 7 81 April 8 94 May 12 110 June 15 124 July 16 134 August 17 139 September 14 124 October 11 103 November 7 81 December 5 80

Month Average Temp ( o C) Sales ( 000 s) January 4 73 February 4 57 March 7 81 April 8 94 May 12 110 June 15 124 July 16 134 August 17 139 September 14 124 October 11 103 November 7 81 December 5 80 Is there any relationship between temperature and sales?

Month Average Temp ( o C) Sales ( 000 s) January 4 73 February 4 57 March 7 81 April 8 94 May 12 110 June 15 124 July 16 134 August 17 139 September 14 124 October 11 103 November 7 81 December 5 80 Is there any relationship between temperature and sales? Can you DESCRIBE this relationship?

The scatter plot goes uphill, so we say there is a positive relationship between temperature and ice cream sales.

The scatter plot goes uphill, so we say there is a positive relationship between temperature and ice cream sales. As temperatures rise, so do ice cream sales!

The scatter plot goes uphill, so we say there is a positive relationship between temperature and ice cream sales. As temperatures rise, so do ice cream sales! If the scatter plot had shown a downhill slope, we would have a negative relationship.

The scatter plot goes uphill, so we say there is a positive relationship between temperature and ice cream sales. As temperatures rise, so do ice cream sales! If the scatter plot had shown a downhill slope, we would have a negative relationship. We could draw a straight line through the middle of most of the points. Thus, there is also a linear association present.

The scatter plot goes uphill, so we say there is a positive relationship between temperature and ice cream sales. As temperatures rise, so do ice cream sales! If the scatter plot had shown a downhill slope, we would have a negative relationship. We could draw a straight line through the middle of most of the points. Thus, there is also a linear association present. If we were to draw a line through the points, most points would lie close to this line. The closer the points lie to a line, the stronger the relationship.

Correlation We now look at how to quantify the relationship between two variables by calculating the sample correlation coefficient. This is

Correlation We now look at how to quantify the relationship between two variables by calculating the sample correlation coefficient. This is r = S XY SXX S YY, where

Correlation We now look at how to quantify the relationship between two variables by calculating the sample correlation coefficient. This is r = S XY SXX S YY, where S XY = ( xy ) n xȳ,

Correlation We now look at how to quantify the relationship between two variables by calculating the sample correlation coefficient. This is r = S XY SXX S YY, where S XY = S XX = ( ) xy n xȳ, ( ) x 2 n x 2,

Correlation We now look at how to quantify the relationship between two variables by calculating the sample correlation coefficient. This is r = S XY SXX S YY, where S XY = S XX = S YY = ( ) xy n xȳ, ( ) x 2 n x 2, ( ) y 2 nȳ 2.

r always lies between 1 and +1.

r always lies between 1 and +1. If r is close to +1, we have evidence of a strong positive (linear) association.

r always lies between 1 and +1. If r is close to +1, we have evidence of a strong positive (linear) association. If r is close to 1, we have evidence of a strong negative (linear) association.

r always lies between 1 and +1. If r is close to +1, we have evidence of a strong positive (linear) association. If r is close to 1, we have evidence of a strong negative (linear) association. If r is close to zero, there is probably no association!

r always lies between 1 and +1. If r is close to +1, we have evidence of a strong positive (linear) association. If r is close to 1, we have evidence of a strong negative (linear) association. If r is close to zero, there is probably no association! A correlation coefficient close to zero does not imply no relationship at all, just no linear relationship.

Example: ice cream sales The easiest way to calculate r is to draw up a table!

Example: ice cream sales The easiest way to calculate r is to draw up a table! Ü Ý Ü 2 Ý 2 ÜÝ 4 73

Example: ice cream sales The easiest way to calculate r is to draw up a table! Ü Ý Ü 2 Ý 2 ÜÝ 4 73 16

Example: ice cream sales The easiest way to calculate r is to draw up a table! Ü Ý Ü 2 Ý 2 ÜÝ 4 73 16 5329

Example: ice cream sales The easiest way to calculate r is to draw up a table! Ü Ý Ü 2 Ý 2 ÜÝ 4 73 16 5329 292 4 57

Example: ice cream sales The easiest way to calculate r is to draw up a table! Ü Ý Ü 2 Ý 2 ÜÝ 4 73 16 5329 292 4 57 16

Example: ice cream sales The easiest way to calculate r is to draw up a table! Ü Ý Ü 2 Ý 2 ÜÝ 4 73 16 5329 292 4 57 16 3249

Example: ice cream sales The easiest way to calculate r is to draw up a table! Ü Ý Ü 2 Ý 2 ÜÝ 4 73 16 5329 292 4 57 16 3249 228

Example: ice cream sales The easiest way to calculate r is to draw up a table! Ü Ý Ü 2 Ý 2 ÜÝ 4 73 16 5329 292 4 57 16 3249 228 7 81 49 6561 567

Example: ice cream sales The easiest way to calculate r is to draw up a table! Ü Ý Ü 2 Ý 2 ÜÝ 4 73 16 5329 292 4 57 16 3249 228 7 81 49 6561 567 8 94 64 8836 752

Example: ice cream sales The easiest way to calculate r is to draw up a table! Ü Ý Ü 2 Ý 2 ÜÝ 4 73 16 5329 292 4 57 16 3249 228 7 81 49 6561 567 8 94 64 8836 752 12 110 144 12100 1320 15 124 225 15376 1860 16 134 256 17956 2144 17 139 289 19321 2363 14 124 196 15376 1736 11 103 121 10609 1133 7 81 49 6561 567 5 80 25 6400 400 120 1200 1450 127674 13362

Notice in the formulae for S XY, S XX and S YY we need the sample means x and ȳ. Thus,

Notice in the formulae for S XY, S XX and S YY we need the sample means x and ȳ. Thus, x = 120 12 = 10 and

Notice in the formulae for S XY, S XX and S YY we need the sample means x and ȳ. Thus, x = 120 12 = 10 and ȳ = 1200 12 = 100.

Similarly,

Similarly, S XY = ( xy ) n xȳ = 13362 12 10 100 = 1362,

Similarly, S XY = ( xy ) n xȳ = 13362 12 10 100 = 1362, ( ) S XX = x 2 n x 2 = 1450 12 10 10 = 250 and

Similarly, S XY = ( xy ) n xȳ = 13362 12 10 100 = 1362, ( ) S XX = x 2 n x 2 = 1450 12 10 10 = 250 and ( ) S YY = y 2 nȳ 2 = 127674 12 100 100 = 7674.

Thus,

Thus, r = S XY SXX S YY

Thus, r = = S XY SXX S YY 1362 250 7674

Thus, r = = S XY SXX S YY 1362 250 7674 = 0.983 (to 3 decimal places).

A few points to remember...

A few points to remember... If your calculated correlation coefficient does not lie between 1 and +1, you ve done something wrong!

A few points to remember... If your calculated correlation coefficient does not lie between 1 and +1, you ve done something wrong! Check that your correlation coefficient agrees with your plot

A few points to remember... If your calculated correlation coefficient does not lie between 1 and +1, you ve done something wrong! Check that your correlation coefficient agrees with your plot an uphill slope ties in with a positive correlation coefficient;

A few points to remember... If your calculated correlation coefficient does not lie between 1 and +1, you ve done something wrong! Check that your correlation coefficient agrees with your plot an uphill slope ties in with a positive correlation coefficient; a downhill slope ties in with a negative correlation coefficient;

A few points to remember... If your calculated correlation coefficient does not lie between 1 and +1, you ve done something wrong! Check that your correlation coefficient agrees with your plot an uphill slope ties in with a positive correlation coefficient; a downhill slope ties in with a negative correlation coefficient; a random scattering of points indicates a correlation coefficient close to zero;

A few points to remember... If your calculated correlation coefficient does not lie between 1 and +1, you ve done something wrong! Check that your correlation coefficient agrees with your plot an uphill slope ties in with a positive correlation coefficient; a downhill slope ties in with a negative correlation coefficient; a random scattering of points indicates a correlation coefficient close to zero; the closer the points lie to a straight line (either uphill or downhill ), the closer to either +1 or 1 the correlation coefficient will be!

Simple linear regression A correlation analysis helps to establish whether or not there is a linear relationship between two variables. However, it doesn t allow us to use this linear relationship.

Simple linear regression A correlation analysis helps to establish whether or not there is a linear relationship between two variables. However, it doesn t allow us to use this linear relationship. Regression analysis allows us to use the linear relationship between two variables. For example, with a regression analysis, we can predict the value of one variable given the value of another.

Simple linear regression A correlation analysis helps to establish whether or not there is a linear relationship between two variables. However, it doesn t allow us to use this linear relationship. Regression analysis allows us to use the linear relationship between two variables. For example, with a regression analysis, we can predict the value of one variable given the value of another. To perform a regression analysis, we must assume that

Simple linear regression A correlation analysis helps to establish whether or not there is a linear relationship between two variables. However, it doesn t allow us to use this linear relationship. Regression analysis allows us to use the linear relationship between two variables. For example, with a regression analysis, we can predict the value of one variable given the value of another. To perform a regression analysis, we must assume that the scatter plot of the two variables (roughly) shows a straight line, and

Simple linear regression A correlation analysis helps to establish whether or not there is a linear relationship between two variables. However, it doesn t allow us to use this linear relationship. Regression analysis allows us to use the linear relationship between two variables. For example, with a regression analysis, we can predict the value of one variable given the value of another. To perform a regression analysis, we must assume that the scatter plot of the two variables (roughly) shows a straight line, and the spread in the Y direction is roughly constant with X.

Look at the scatter plot of ice cream sales against temperature.

Look at the scatter plot of ice cream sales against temperature. A line of best fit can be drawn through the data;

Look at the scatter plot of ice cream sales against temperature. A line of best fit can be drawn through the data; This line could then be used to make predictions of ice cream sales based on temperature.

Look at the scatter plot of ice cream sales against temperature. A line of best fit can be drawn through the data; This line could then be used to make predictions of ice cream sales based on temperature. However, everyone s line is bound to be slightly different!

Look at the scatter plot of ice cream sales against temperature. A line of best fit can be drawn through the data; This line could then be used to make predictions of ice cream sales based on temperature. However, everyone s line is bound to be slightly different! And so everyone s predictions will be slightly different!

Look at the scatter plot of ice cream sales against temperature. A line of best fit can be drawn through the data; This line could then be used to make predictions of ice cream sales based on temperature. However, everyone s line is bound to be slightly different! And so everyone s predictions will be slightly different! The aim of regression analysis is to find the best line which goes through the data.

Look at the scatter plot of ice cream sales against temperature. A line of best fit can be drawn through the data; This line could then be used to make predictions of ice cream sales based on temperature. However, everyone s line is bound to be slightly different! And so everyone s predictions will be slightly different! The aim of regression analysis is to find the best line which goes through the data. At this point, we should remind ourselves about the equation for a straight line and how to draw it!

The regression equation

The regression equation The simple linear regression model is given by Y = α+βx +ǫ, where

The regression equation The simple linear regression model is given by Y = α+βx +ǫ, where Y is the response variable and

The regression equation The simple linear regression model is given by Y = α+βx +ǫ, where Y is the response variable and X is the explanatory variable.

The regression equation The simple linear regression model is given by where Y = α+βx +ǫ, Y is the response variable and X is the explanatory variable. ǫ is the random error taken to be zero on average, with constant variance.

The regression equation The simple linear regression model is given by where Y = α+βx +ǫ, Y is the response variable and X is the explanatory variable. ǫ is the random error taken to be zero on average, with constant variance. α represents the intercept of the regression line (the point where the line cuts the Y axis),

The regression equation The simple linear regression model is given by where Y = α+βx +ǫ, Y is the response variable and X is the explanatory variable. ǫ is the random error taken to be zero on average, with constant variance. α represents the intercept of the regression line (the point where the line cuts the Y axis), β represents the slope of the regression line (i.e. how steep the line is), and

The regression equation The simple linear regression model is given by where Y = α+βx +ǫ, Y is the response variable and X is the explanatory variable. ǫ is the random error taken to be zero on average, with constant variance. α represents the intercept of the regression line (the point where the line cuts the Y axis), β represents the slope of the regression line (i.e. how steep the line is), and We need to estimate α and β from our sample data. But how?

Remember, the aim of regression analysis is to find a line which goes through the middle of the data in the scatter plot, closer to the points than any other line;

Remember, the aim of regression analysis is to find a line which goes through the middle of the data in the scatter plot, closer to the points than any other line; So the best line will minimise any gaps between the line and the data!

Remember, the aim of regression analysis is to find a line which goes through the middle of the data in the scatter plot, closer to the points than any other line; So the best line will minimise any gaps between the line and the data! It turns out that the values of α and β which give this best line are ˆβ = S XY S XX and

Remember, the aim of regression analysis is to find a line which goes through the middle of the data in the scatter plot, closer to the points than any other line; So the best line will minimise any gaps between the line and the data! It turns out that the values of α and β which give this best line are ˆβ = S XY S XX and ˆα = ȳ ˆβ x.

Example: ice cream sales For the ice cream sales data, we can find the best line using the formulae on the previous slide! Thus

Example: ice cream sales For the ice cream sales data, we can find the best line using the formulae on the previous slide! Thus and ˆβ = S XY S XX = 1362 250 = 5.448,

Example: ice cream sales For the ice cream sales data, we can find the best line using the formulae on the previous slide! Thus and ˆβ = S XY S XX = 1362 250 = 5.448, ˆα = ȳ ˆβ x = 100 5.448 10 = 45.52.

Thus, the regression equation is

Thus, the regression equation is Ý = 45.52+5.448Ü

Making predictions We can use our estimated regression equation to make predictions of ice cream sales based on average temperature.

For example: Predict monthly sales if the average temperature is 10 o C.

For example: Predict monthly sales if the average temperature is 10 o C. We can use take a reading from our graph, or, more accurately, use our regression equation! y = 45.52 + 5.448x = 45.52+5.448 10 = 45.52+54.48 = 100, i.e.

For example: Predict monthly sales if the average temperature is 10 o C. We can use take a reading from our graph, or, more accurately, use our regression equation! y = 45.52 + 5.448x = 45.52+5.448 10 = 45.52+54.48 = 100, i.e. if the average temperature is 10 o C, we can expect sales of 100,000. i.e.