EE145L Lab 1, Linear Regression 10/4/2012 Abstract We examined multiple sets of data to assess the relationship between the variables, linear or non-linear, in addition to studying ways of transforming data to make it linear. Then, we compared the linear relation between variables for one data set using both estimation and linear regression techniques, to conclude that linear regression was a more accurate way of determining a linear fit to the data. Introduction We used the linear regression technique to analyze different sets of data. This technique is based off finding the best-fit line to data to see if it can be called linear. The premise is that a best-fit line should be one that is the least distant from all of the individual data points, but we do not actually try and minimize the distance itself. We actually minimize the square of the distance, since the distance itself can be positive or negative, and if added together would cancel to be a lesser total distance. So, the theory of linear regression is that we will find the best fit line to the data by minimizing the distance-squared from the line. We expect to use this technique to find best-fit lines to data that fit much better than hand-drawn lines, and to experiment with transformations of non-linear data sets to make them linear. We start with the definition of a line being y = mx+b, writing it as y i = a 0 +a 1 x i +e i ; e i = y i (a 0 +a 1 x i ) We also use the quantity S, where S = Σe 2 i = Σ[y i -(a 0 +a 1 x i )] 2 is a function of a 1 and a 0. This is the square of the difference between the individual data points and the linear regression line, so this is the quantity we are minimizing. To minimize an equation, you take the first derivative and set it equal to zero; since this is a multi-variable equation, we will take the partial derivatives ds/da 0 and ds/da 1 and set them both equal to zero, and solve the 2 equations simultaneously to find the 2 variables. We find that ds/da 0 = Σ(2)[y i (a 0 +a 1 x i )](-1) na 0 + a 1 Σx i = Σy i, where n is the number of total data points. We also find ds/da 1 = Σ(2)[y i (a 0 +a 1 x i )](-x i ) 2 Σx i y i = a 0 Σx i +a 1 Σx i From these 2 equations, we can find values for the a 1 and a 0 in our predicted line; we find a 1 = a 0 = Thus, to find our best-fit line of y = a 0 + a 1 x i, we simply plug in our x- and y- values to find a 0 and a 1, and that will give us the equation of the best-fit line. Results and Analysis
Exercise 0 In this exercise, we considered different given sets of data, and assessed characteristics of the data such as variable dependence/independence, expected linearity and options of transforming the data to make it linear. The independent variable has values that the experimenter can set, and the values of the independent variable depend on the choices for the independent one. We had 3 different sets of data, labeled in the charts below as part 1 for the first set, part 2 for the second set, and part 3 for the third set. For the first set of data, contained in Figure 1 below labeled Exercise 0 Part 1, we found the independent variable to be the as a quadratic or cubic. Figure 1, Exercise 0 Part 1 For the second set of data, seen in Figure 2 below labeled Exercise 0 Part 2, we found the independent variable to be the data was graphed, it did indeed to appear to be linearly related without any transformations necessary.
Figure 2, Exercise 0 Part 2 For the third set of data, seen below in Figure 3 labeled Exercise 0 Part 3, we found the dependent variable to be the Figure 3, Exercise 0 Part 3 not appear to be linear. However, we hypothesized that transforming the data correctly would yield a linear relationship.
Figure 4, Exercise 0 Part 4 Exercise 1 In this part of the lab, we analyzed a new set of given data, specific heat of a chemical vs. its temperature. We were first asked to plot the data on a scatter plot, as seen in Figure 5 below. We determined that it appeared a general linear relationship existed between the variables. Figure 5, Scatterplot of data from Exercise 1 We then fit a straight line to the data, and found the slope (m) to be approximately This appeared to fit in well with the linear appearance of this data set.. Exercise 2 In this exercise, we were asked to perform a linear regression on the same data set
as in exercise 1. We wanted to find a linear relationship y = a o + a 1 x, and used the derivation given in the introduction to find that a 1 = a 0 =. We then did the appropriate sums with our x- and y-values, noting that n was 12 since we had 12 data points, and found: 0. Plugging these into the equations for a o and a 1, we found a 1 =.002257, and a 0 = 1.51073. Thus, our line would be y =. We were then asked to estimate the specific heat of this chemical (y) when the temperature (x) was 75 degrees Celsius, as we did in exercise 1. Using our new line equation and plugging in 75 for x yielded y =. The percent difference between this value and the value found in exercise 1 was, using this value as the actual and the value from part 1 as experimental, = 1.205 We believe that the value found in exercise 2 is more accurate, since it was found using a line obtained from performing a linear regression, and the value from exercise 1 was found based off an equation of a line that was hand-drawn to fit the data. Conclusion This experiment was a success, we found all of the conclusions to match our expectations in the introduction. We explored linear versus non-linear data sets, and transformations from non-linear to linear. We also compared data analyzed using both linear regression and simple hand-drawn techniques, and found that the linear regression, as expected, made a more accurate fit to our data. We also saw that further data concluded from these lines varied between the linear regression and estimation models by 1.2%, and thus, values based off the estimated line would not be as accurate as those based on the linear regression line. Linear regression is a valuable technique to check the relationship between variables in an experiment, and is only the beginning of a wide variety of statistical analytical techniques to check correlation between variables.