CORRELATION AND REGRESSION

Similar documents
CORRELATION AND REGRESSION

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

CORRELATION ANALYSIS. Dr. Anulawathie Menike Dept. of Economics

Correlation and Regression

Correlation and Regression

Chapters 9 and 10. Review for Exam. Chapter 9. Correlation and Regression. Overview. Paired Data

QUANTITATIVE TOOLS IN BUSINESS MAY Four Quarterly Moving Total

IOP2601. Some notes on basic mathematical calculations

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Business Mathematics and Statistics (MATH0203) Chapter 1: Correlation & Regression

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

1)I have 4 red pens, 3 purple pens and 1 green pen that I use for grading. If I randomly choose a pen,

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Correlation. Quantitative Aptitude & Business Statistics

not to be republished NCERT Correlation CHAPTER 1. INTRODUCTION

Lecture # 37. Prof. John W. Sutherland. Nov. 28, 2005

Chapter 2 Statistics. Mean, Median, Mode, and Range Definitions

SCATTER DIAGRAMS M.K. HOME TUITION. Mathematics Revision Guides Level: GCSE Higher Tier

Section Linear Correlation and Regression. Copyright 2013, 2010, 2007, Pearson, Education, Inc.

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13

CORELATION - Pearson-r - Spearman-rho

Lecture 8 CORRELATION AND LINEAR REGRESSION

Correlation and Regression

Correlation. Engineering Mathematics III

IB Questionbank Mathematical Studies 3rd edition. Bivariate data. 179 min 172 marks

A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION. 1. Sketch scatter diagrams with at least 5 points to illustrate the following:

CORRELATION. compiled by Dr Kunal Pathak

Chapter 12 : Linear Correlation and Linear Regression

(c) Plot the point ( x, y ) on your scatter diagram and label this point M. (d) Write down the product-moment correlation coefficient, r.

Statistics S1 Advanced Subsidiary

Exam 2 Review Math 118 Sections 1 and 2

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Class 11 Maths Chapter 15. Statistics

About Bivariate Correlations and Linear Regression

Measuring Associations : Pearson s correlation

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots

CORRELATION AND SIMPLE REGRESSION 10.0 OBJECTIVES 10.1 INTRODUCTION

Correlation and Regression

QUANTITATIVE TOOL IN BUSINESS MAY 2011 SOLUTION. Payment x x x x x x x x x x

S1 Revision Notes: Regression

Forecasting. Dr. Richard Jerz rjerz.com

PhysicsAndMathsTutor.com

Statistics S1 Advanced/Advanced Subsidiary

Lecture # 31. Questions of Marks 3. Question: Solution:

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

THE ROYAL STATISTICAL SOCIETY 2002 EXAMINATIONS SOLUTIONS ORDINARY CERTIFICATE PAPER II

Chapter 4 Data with Two Variables

OCR Maths S1. Topic Questions from Papers. Bivariate Data

Chapter 4 Describing the Relation between Two Variables

Scatterplots and Correlations

Bivariate data data from two variables e.g. Maths test results and English test results. Interpolate estimate a value between two known values.

UGRC 120 Numeracy Skills

KINDLY REFER TO CHAPTER 9 OF THE COMPREHENSIVE VIDEO LECTURES AND READ UP THE TOPICS BELOW BEFORE YOU ATTEMPT THE QUESTIONS THAT FOLLOW.

PhysicsAndMathsTutor.com

Engage Education Foundation

Chapter 4 Data with Two Variables

YEAR 10 GENERAL MATHEMATICS 2017 STRAND: BIVARIATE DATA PART II CHAPTER 12 RESIDUAL ANALYSIS, LINEARITY AND TIME SERIES

Introductory Statistics

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

Correlation and Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Correlation. Martin Bland. Correlation. Correlation coefficient. Clinical Biostatistics

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Linear correlation. Chapter Introduction to linear correlation

Wednesday 8 June 2016 Morning

BIOSTATISTICS NURS 3324

Analysing data: regression and correlation S6 and S7

OCR Maths S1. Topic Questions from Papers. Representation of Data

NEGATIVE z Scores. TABLE A-2 Standard Normal (z) Distribution: Cumulative Area from the LEFT. (continued)

Statistics 1. Revision Notes

Regression Models. Chapter 4

physicsandmathstutor.com Paper Reference Advanced/Advanced Subsidiary Thursday 9 June 2005 Morning Time: 1 hour 30 minutes

Operations Management

Linearization of Nonlinear Equations

2614 Mark Scheme June Mark Scheme 2614 June 2005

GCSE Mathematics Practice Tests: Set 3

Correlation measures the strength of the relationship between 2 variables.

MATHEMATICS. Perform a series of transformations and/or dilations to a figure. A FAMILY GUIDE FOR STUDENT SUCCESS 17

1 A Review of Correlation and Regression

PhysicsAndMathsTutor.com

PS2: Two Variable Statistics

Results and Analysis 10/4/2012. EE145L Lab 1, Linear Regression

Mathematics 2018 Practice Paper Paper 1 (Non-Calculator) Higher Tier

Contents. 9. Fractional and Quadratic Equations 2 Example Example Example

Section 1.1: Patterns in Division


APPENDIX 1 BASIC STATISTICS. Summarizing Data

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Advanced Algebra (Questions)

4/22/2010. Test 3 Review ANOVA

Upon completion of this chapter, you should be able to:

MINI LESSON. Lesson 2a Linear Functions and Applications

Algebra vocabulary CARD SETS Frame Clip Art by Pixels & Ice Cream

Uncertainty, Error, and Precision in Quantitative Measurements an Introduction 4.4 cm Experimental error

Transcription:

CORRELATION AND REGRESSION CORRELATION The correlation coefficient is a number, between -1 and +1, which measures the strength of the relationship between two sets of data. The closer the correlation coefficient is to +1 or -1 the stronger the relationship and the easier it is to predict one item by using the other. For example, there is a strong relationship between amount of daily sunshine and the sales of ice-cream so the correlation coefficient is close to 1. Positive and Negative correlation coefficient If the two sets of data are related in such a way that as one increases then so does the other, there is a positive correlation between them and the correlation coefficient will be +. The daily sunshine and sales of ice-cream have a positive correlation. If they are related so that as one increases the other decreases, then they have a negative correlation. For example, the amount of daily sunshine and the sales of rainwear would have a negative correlation. Strong Positive Correlation Strong Negative Correlation r = 0.8 r = -0.8 Correlation No correlation Perfect positive r = 0 r = +1 Perfect Negative Correlation r = -1 Perfect correlation is usually found only in science. In most other situations the correlation coefficient is a decimal r = nσxy ΣxΣy { } { nσx ( Σx) } nσy ( Σy) Walter Fleming Page 1 of 7

RANK CORRELATION This is another method of finding the correlation coefficient. With this method both sets of data must be ranked i.e. numbered in either ascending or descending order. Both sets must be ranked in the same way i.e. either both ascending or both descending. Then the difference between the rank is found by subtracting one from the other. This gives us d for the formula. Example 6Σd r = 1 nn ( 1) A group of 7 candidates are ranked in a written exam and in a practical exam. The following table gives the results: Candidate A B C D E F G Place in Written 3 5 1 4 7 6 Place in Practical 4 5 3 1 6 7 R1 R d d A 3 4 1 1 B 5 5 0 0 C 1 3 4 D 4 4 E 1 1 1 F 7 6 1 1 G 6 7 1 1 16 Σd n = 7 6 d r = 1 1 0.9 0.71 nn ( 1) = 6(16) 1 1) = = 7(7 This indicates that there is only a fair correlation between the written exam results and the practical results. Walter Fleming Page of 7

Regression Regression analysis examines what the relationship between two sets of data is. It relates one set to the other by means of an equation, the Regression equation. This is the equation of a straight line. Regression assumes that the two sets of data lie in a straight line, known as the line of best fit or the Regression Line so the stronger the correlation the more reliable this equation is for forecasting. The equation is Y = a + bx where and b = nσxy ΣxΣy nσx ( Σx) Σy Σ a = b x n n To forecast, substitute in the X value given into the equation. The Coefficient of Determination This is found using the formula: (r.100)%. It gives the percentage of the variation in the dependent variable that is explained through one s knowledge of the variation in the independent variable. Walter Fleming Page 3 of 7

Example: The number of daily hours sunshine and the sales of ice-cream for a particular week is given in the following table: Hours of sunshine Icecream sales (000kg) 4 5 3 6 5 9 9 10 11 10 1 10 15 16 Find the regression equation and use it to forecast the expected sales on a day in which the expected hours of sunshine is 7 hours. Procedure for calculating the Product moment Correlation Coefficient. 1. Arrange the to sets of data in columns, Col.1 is X, Col. is Y If the data are labelled X and Y, follow that, but if they are not, X is the data over which you have control, (i.e. the independent variable.) for example price is X since you can directly control them, and Sales is Y. If in doubt, take the top line as X and the second line as Y.. Form three more columns, Col. 3 for XY, the product of each pair, Col 4 for X squared, and Col 5 for Y squared. 3. Calculate the entries in each column. 4. Add up each column. This gives the Σs in the formula. N is the number of rows of data. 5. Insert the results into the formula and calculate. Be careful with the calculations; don't try to do too much at the one time. Walter Fleming Page 4 of 7

Example To find Pearson s product moment correlation coefficient for the following data showing the cost of maintaining 10 machines of different ages (in months): Machin 1 3 4 5 6 7 8 9 10 e Age 5 10 15 0 30 30 30 50 50 60 cost 190 40 50 300 310 335 300 300 350 395 Put the data in vertical columns, identifying them as X and Y. Then find the values of the columns XY, X and Y. X Y XY X Y 5 190 950 5 36100 10 40 400 100 57600 15 50 37580 5 6500 0 300 6000 400 90000 30 310 9300 900 96100 30 335 10050 900 115 30 300 9000 900 90000 50 300 15000 500 90000 50 350 17500 500 1500 60 395 3700 3600 15605 300 970 97650 1050 913050 ΣX ΣY ΣXY ΣX ΣY Put these values into the formula: r = = 10x97650 300x970 [ 10x1050 300 ] x[ 10x913050 970 ] 85500 85500 [30500][309600] = 944800000 = 85500 97174.04 = 0.88 A coefficient of 0.88 tells us that the link between age and cost is strong. Also, it is a positive correlation, which indicates that if age increases then cost also increases. The coefficient of determination = r x100 = 77.44% This indicates that 77.44% of the differences that occur in the cost are associated with differences in age. The remaining.56% of the differences are due to other factors. A rough guide to interpreting the correlation coefficient strength: 1.0 0.9 Very strong 0.9 0.8 Strong 0.8 0.7 Fairly strong Walter Fleming Page 5 of 7

under 0.7 Weak to none Walter Fleming Page 6 of 7

To find the least squares equation of the regression line (line of best fit) The equation is of the form Y = a + bx Using the formulae for a and b, we get 10x97650 300x970 b = 10x1050 300 85500 = =.8 30500 a = 970.8x 300 = 97 84 =13 10 10 So the regression equation is Y = 13 +.8X This can be used for forecasting. To forecast the cost of maintaining a machine that is 3 months old, substitute in 3 for X and find Y, the cost: Y = 13 +.8x3 = 77.4 To draw the regression line on the scatter graph: Find two points on the line by putting in two values for X and finding the Y. It is easier to draw if you take values for X that are near the lower end and near the higher end of the data. In this example you could take X = 5, which gives a value of Y = 7 And X = 50, which gives a value of Y = 353 Now plot the two points (5,7) and (50,353) on the scatter diagram then join them with a straight line. This is the regression line for the data. Walter Fleming Page 7 of 7