Business Mathematics and Statistics (MATH0203) Chapter 1: Correlation & Regression
Dependent and independent variables The independent variable (x) is the one that is chosen freely or occur naturally. The dependent variable (y) occurs as a consequence of the value of the independent variable.
Example: Numbers of item produced (x) and total cost of production (y). The time spent on promotion (x) and the level of sales volume (y). Sometimes the relationship between a dependent and an independent variable is called a causal relationship.
Definition of correlation Correlation is concerned with describing the strength of the relationship between two variables.
Scatter Diagrams Visual representation can give an immediate impression of a set of data. Are these two variables having strong relationship, moderate relationship, weak relationship or no relationship?
Independent variable? Dependent variable? Relationship?
Question 1.1 The table below presents the data concerning the number of hours of training in typewriting and the speed of typing a given text for 10 randomly selected typists. Typist 1 2 3 4 5 6 7 8 9 10 Number of 120 70 100 50 150 90 30 40 80 20 hour of training Speed (word/minute) 30 18 25 14 35 21 10 15 20 10 Draw a scatter diagram.
8 CORRELATION To measure how well the regression line fits the actual data By: i. Coefficient of determination (R 2 ) ii. Coefficient of correlation (R)
The correlation coefficient, r We need a way of measuring the value of the correlation between two variables. This is achieved through a correlation coefficient, r. Notice that: -1 r 1
Perfect correlation Partial correlation No correlation
11
r = 1 perfectly positive relationship r = 0.9 strong positive relationship r = 0.5 moderate positive relationship r = 0.2 weak positive relationship r = 0 no correlation / no relationship r = -1 perfectly negative relationship r = -0.9 strong negative relationship r= -0.5 moderate negative relationship r = -0.2 weak negative relationship
Positive correlation Two variables x and y are moving in the same direction. i.e. If x increases, y will increases. If x decreases, y decreases. Examples: 1) Numbers of calls made by salesman and number of sales obtained. 2) Age of employee and salary.
Negative correlation Two variables x and y are moving in the opposite direction. i.e. If x increases, y will decreases. If x decreases, y increases. Example: 1) Number of weeks of experience and number of errors made. 2) Grade obtained and number of hours watching television.
We calculate correlation coefficient by using the following formula:
Question 1.2: The data of the following table relates the weekly maintenance cost (RM) to the age (in months) of five machines of similar type in a manufacturing company. Calculate the product moment correlation coefficient between age and cost. Machine 1 2 3 4 5 Age 5 10 15 20 30 Cost 10 20 20 30 30
x y xy x² y² r = n xy x y 2 2 2 2 n x ( x) n y ( y) = = Working
An alternative method of measuring correlation is based on the ranks of the sizes of item values. Rank correlation coefficient: r 2 6 d 1 n( n 2 1)
Question 1.3: Find relationship between mid test and final exam using rank correlation. Person A B C D E F G Mid test score 50 62 85 91 74 59 84 Final Exam score 67 70 80 79 68 67 81
Solution: Person A B C D E F G x 50 62 85 91 74 59 84 r x y 67 70 80 79 68 67 81 r y ( rx ry 2 ) 2 6 d r 1 2 = n( n 1)
The coefficient of determination, r² The correlation coefficient is calculated as r = A The coefficient of determination, r ²= A² In words, the B% (A² x 100) variation in variable y (specify) is due to variable x (specify). The other (A-B) % of the variation is due to other factors such as..
Definition of regression Regression is concerned with obtaining a mathematical equation, which describes the relationship between two variables. The equation can be used for comparison or estimation purposes.
Obtaining a regression line (least square regression line) Formula for obtaining the y on x least squares regression line, y = a + bx, where
Question 1.4: Refer back to question 1.2, find the least square regression line of machine maintenance cost (y) on machine age (x). Solution:
Question 1.5: Suppose you obtain the least square regression line: y = 1.5x - 96.9, Where x = temperature of the weather ( F), y = water consumption (ounces) Predict the amount of water a person would drink when the temperature is 95 F. Solution: Given y = 1.5x 96.9, when x = 95, y = 1.5(95) 96.9 = ounces