9-1 l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical Method for Determining Regression 9.4 Least Square Method 9.5 Correlation Coefficient and Coefficient Determination 9.6 Test of Significance
9-2 9.0 Introduction to Regression Regression is a statistical procedure for establishing the relationship between 2 or more variables. This is done by fitting a linear equation to the observed data. The regression line is then used by the researcher to see the trend and make prediction of values for the data. There are 2 types of relationship: Simple ( 2 variables) Multiple (more than 2 variables)
9-3 9.1 Simple Linear Regression It involve relationship analysis between two variables (One independent variable and one dependent variable). Its model use an equation that describes a dependent variable (Y) in terms of an independent variable (X) plus random error ε. where, Random error, value. Y 0 1 0 1 X = intercept of the line with the Y-axis = slope of the line = random error is the difference of data point from the deterministic This regression line is estimated from the data collected by fitting a straight line to the data set and getting the equation of the straight line, Ŷ b b x Irwin/McGraw-Hill 0 1 Andrew F. Siegel, 1997 and 2000
9-4 9.1 Simple Linear Regression Example of independent and dependent variables. 1) A nutritionist studying weight loss programs might wants to find out if reducing intake of carbohydrate can help a person reduce weight. X is the carbohydrate intake (independent variable). Y is the weight (dependent variable). 2) An entrepreneur might want to know whether increasing the cost of packaging his new product will have an effect on the sales volume. X is the cost (independent variable) Y is sales volume (dependent variable)
9-5 9.2 Scatter Diagram A scatter plot is a graph or ordered pairs (x,y). The purpose of scatter plot to describe the nature of the relationships between independent variable, X and dependent variable, Y in visual way. The independent variable, x is plotted on the horizontal axis and the dependent variable, y is plotted on the vertical axis.
9-6 9.2 Scatter Diagram A linear regression can be develop by freehand plot of the data. Example: The given table contains values for 2 variables, X and Y. Plot the given data and make a freehand estimated regression line. X -3-2 -1 0 1 2 3 Y 1 2 3 5 8 11 12
9-7 9.2 Scatter Diagram
9-8 9.2 Scatter Diagram
9-9 9.4 Least Square Method Produces a straight line that minimizes the sum of square differences between the point and the line (determine values for 0 and 1 that ensure a best fit for the estimated regression line to the sample data points). Involve some calculation procedure. However, as we are using SPSS output, no calculation is necessary.
9-10 9.4 Least Square Method
9-11 9.4 Least Square Method Conducting simple linear regression through SPSS. The data below represent scores obtained by ten primary school students before and after they were taken on a tour to the museum (which is supposed to increase their interest in history) Before,x 65 63 76 46 68 72 68 57 36 96 After, y 68 66 86 48 65 66 71 57 42 87 Fit a linear regression model with before as the explanatory variable and after as the dependent variable. Predict the score a student would obtain after if he scored 60 marks before.
9-12 9.4 Least Square Method Analysis 1 3 2
9-13 9.4 Least Square Method Output 1) Yˆ 12.306 0.824x 2) Yˆ 12.306 0.824(60) 61.746 If a student scored 60 marks in before, he would obtain 61.746 in after
9-14 9.5 Correlation Coefficient and Coefficient of Determination Correlation measures the strength of a linear relationship between the two variables. Also known as Pearson s product moment coefficient of correlation. The symbol for the sample coefficient of correlation is r. Values of r 1 r 1 close to 1 close to -1 close to 0 strong positive linear relationship between x and y. strong negative linear relationship between x and y. little or no linear relationship between x and y.
9-15 9.5 Correlation Coefficient and Coefficient of Determination E(y) Regression line Intercept 0 Slope 1 is positive x Positive linear relationship
9-16 9.5 Correlation Coefficient and Coefficient of Determination E(y) Regression line 0 Intercept Slope 1 is negative x Negative linear relationship
9-17 9.5 Correlation Coefficient and Coefficient of Determination E(y) Regression line 0 Intercept Slope 1 is 0 x No relationship
9-18 9.5 Correlation Coefficient and Coefficient of Determination The coefficient of determination is a measure of the variation of the dependent variable (Y) that is explained by the regression line and the independent variable (X). If r = 0.90, then = 0.81. It means that 81% of the variation in the dependent variable (Y) is accounted for by the variations in the independent variable (X). r 2 The rest of the variation, 0.19 or 19%, is unexplained and called the coefficient of nondetermination. Formula for the coefficient of nondetermination is 1- r 2
9-19 9.5 Correlation Coefficient and Coefficient of Determination Output 1) r 0.942 Strong positive linear relationship 2 2) r 0.887 88.7% variation in X is explained by Y
9-20 9.6 Test of Significance Simple linear regression involves two estimated parameters which are β0 and β1. Test of hypothesis is used in order to know whether independent variable is significant to dependent variable (whether X provides information in predicting Y). The t-test or analysis of variance (ANOVA) method is an approach to test the significance of the regression. Basically, two test are commonly used: t Test F Test (ANOVA)
9-21 9.6 Test of Significance t-test H H : 0 1 : 1 1 0 0 (NO RELATIONSHIP) (THERE IS RELATIONSHIP) Compare P-value (refer to Coefficient table) with α Reject H 0 if P-value < α If we reject Y. H 0 there is a significant relationship between variable X and
9-22 9.6 Test of Significance F-test (ANOVA) H : 0 0 1 (NO RELATIONSHIP) (THERE IS RELATIONSHIP) H : 1 1 0 Compare P-value (refer to ANOVA table) with α Reject H 0 if P-value < α If we reject Y. H 0 there is a significant relationship between variable X and
9-23 9.6 Test of Significance Construction of ANOVA table Source of variation Sum of squares Degree of freedom Mean square f test Regression SSR 1 MSR =SSR/1 Error SSE n-2 MSE =SSE/n-2 Total SST n-1 f= MSR/MSE
9-24 Exercise The following table gives information on lists of the midterm, X, and final exam, Y, scores for seven students in a statistics class. X 79 95 81 66 87 94 59 Y 85 97 78 76 94 84 67 Find the least squares regression line. 2 Explain the values of r and r. Predict the final exam scores the student will get if he/she got 60 marks for midterm test. Do the data support the existence of a linear relationship between midterm and final exam? Test using α = 0.05.
9-25 Exercise The manufacturer of Cardio Glide exercise equipment wants to study the relationship between the number of months since the glide was purchased and the length of time the equipment was used last week. Determine the regression equation. At 0.01, test whether there is a linear relationship between the variables