Correlation and Regression (Excel 2007) (See Also Scatterplots, Regression Lines, and Time Series Charts With Excel 2007 for instructions on making a scatterplot of the data and an alternate method of finding the correlation coefficient and the equation of the regression line.) 1. Correlation Coefficient. The table below displays the heights (in inches) of a sample of 11 brother sister pairs. We will first find just the correlation coefficient for this sample. Go to Data/Data Analysis, and choose Correlation: Click and drag both columns for the Input Range, make sure they are Grouped by Columns, and check the Labels box if appropriate:
The output below shows that the correlation coefficient is 0.55805; there is a positive linear relationship between the heights of the brothers and sisters, but this linear relationship is not very strong. 2. Regression Line. We will treat the brother s height as the independent variable, X, and the sister s height as the dependent variable, Y. To find the equation of the regression line, go to Data/Data Analysis, and choose Regression: Click and drag the data into the appropriate Input Range. Note that the Y (dependent variable) Range is put in first. Also, check the Labels box if appropriate.
The output is shown below with relevant information highlighted: First, in the chart under Regression Statistics, we have the coefficient of determination, 0.3114. This means that 31.34% of the variation in the sisters heights is explained by their linear relationship with the heights of their brothers. Also, in the chart at the bottom of the output display, under Coefficients, we find Intercept, the y intercept of the regression line, and, next to the name of the independent variable, Brother, we find the slope of the regression line. Here, the y intercept is approximately 27.6, and the slope is approximately 0.527. The equation of the regression line, which we could use to predict the sister s height from that of her brother, is 0.527 27.6. For each increase of one inch in the height of the brother, his sister s height is expected to increase by 0.527 inches. The information highlighted in blue gives the results of the Linear Regression T Test: We are testing whether there is a significant linear relationship between the brothers heights and the heights of their sisters. Specifically, for ρ, the population correlation coefficient, (or β, the slope of the regression line for the population,) we test the null hypothesis that 0 β 0 versus the alternative that ρ 0 ( β 0). The test statistic is t = 2.0175, and the p value for the test is t = 0.0744. At the level α = 0.05, we would fail to reject the null hypothesis and conclude that we don t have statistically significant evidence of a linear relationship between the heights of brothers and their sisters. (If you need to know the standard error of estimate, it can be found under Regression Statistics also; here, the standard error is 2.247.)
3. Multiple Regression. Suppose that several variables may be related to a person s salary. The table below lists salaries, years of employment, years of previous experience, and years of education for a sample of employees at a certain company. (This data is from Example 1 in section 9.4.) We go to Data/Data Analysis, and choose Correlation, as before. This time, we click and drag all four columns of data: The results show correlation coefficients between all pairs of variables. For example, the correlation coefficient between employment and salary is 0.824, the correlation coefficient between experience and salary is 0.189, and the correlation coefficient between education and salary is 0.375. We see that the linear relationship between years of employment and salary is the strongest, and the linear relationship between years of previous experience and salary is the weakest. We can also find a regression education that could be used to predict salary (y) from information on years of employment (x 1 ), years of previous experience (x 2 ), and years of education (x 3 ):
Go to Data/Data Analysis, and choose Regression as before. This time, only the Salary data is put into the Y (dependent variable) Range, and the remaining three columns of data are put into the X (independent variable) Range: The output is shown below: The coefficients of the regression line can be found under Coefficients. The regression equation is 49764 364 228 267.
We can interpret the coefficients in the following way: The coefficient 364 for Employment means that, for each increase of one year of employment, the predicted salary will increase by about $364. Similarly, for each year of increase in previous experience, the predicted salary will increase by $228, and for each year of education, the predicted salary will increase by $267. We also see that the coefficient of determination is 0.944. This means that about 94.4% of the variation in salary is explained by its linear relationship with years of employment, experience, and education. The remaining 5.6% is explained by other factors or chance.