Graphical Analysis and Errors - MBL

I. Graphical Analysis Graphical Analysis and Errors - MBL Graphs are vital tools for analyzing and displaying data throughout the natural sciences and in a wide variety of other fields. It is imperative that you develop an easy familiarity with their use and abuse 1. Graphs allow us to explore the relationship between two quantities -- an independent variable usually plotted on the x-axis and a dependent variable usually plotted on the y-axis. For example you may desire to plot a velocity as a function of time. In this case the time is the independent variable since it is effectively "changed" while the resulting velocity is measured. Thus we have a function which we write as: v = v(t) (1) As another example, consider an experiment measuring the pressure of a gas while its temperature is varied. Clearly the measured pressure is the dependent variable which changes with the independent temperature set by the experimenter. You might also try varying the volume of the gas or the number of moles of gas, to see how the pressure varies with these quantities. However we will usually focus on just two variables at a time. Usually the key to graphical data analysis is finding a way to plot the two variables so that a straight line is obtained. As we discuss below, different types of graphs are used to give straight lines for different mathematical relationships between the variables. In general, you will face two types of situations: 1) You have no theory relating the two variables. 2) You have a theory that gives the functional relationship between the variables but the theory contains parameters that you need to determine. In case 1, the relationship between variables can often be determined by investigating different types of graphs. In case 2, the slope of the plotted points and the position at which they intersect the y-axis can be used to determine the values of parameters that appear in the theory. In this case, graphical analysis is now becoming less common than a more precise computer fit of the theory to the data. However, after a computer fit, the data and fit lines are usually plotted in order to assess how well the theory works. II. How to Make a Good Graph Before discussing the different kinds of plots we will encounter, we consider the practical aspects of making a good graph. Usually you will follow three steps: 1) Determine the minimum and maximum values of the data in both the x and the y directions. 1 For an excellent overview of how graphs and statistics are widely used to deceive us, see the wonderful little book How to Lie with Statistics, by Darrel Huff. 1 Graphical Analysis & Errors

2) Choose a scale for the tick marks on the two axes. Usually you will want to use as much of the graph paper as possible and make the scale on a given axis just sufficient to cover beyond the minimum and maximum values to be plotted. On the other hand, it's also best to use simple values for the tick marks, so that some trade-off between these objectives may be necessary. 3) Plot the data (including error bars if applicable). In fact, it's often a good idea to plot the data as you take it instead of waiting until you finish collecting it. One of the lab partners may be able to record data in a table as well as plotting it while the data is coming in. We now go on to consider the types of graphs which we will use for different relations between the independent and dependent variables. III. Linear Relationships If the relationship between the two variables is linear: y(x) = kx + y 0 (2) then a plot on linear graph paper can be fit with a "best" straight line. The slope of the line on the graph directly yields the value of k and the y-axis intercept yields the value of y 0. To get the slope, it's best to pick two points on the best-fit line that are as far apart as possible. This minimizes the effect of errors in reading positions on the line. You might ask what has been gained by using the plot rather than just using two of the data points with Equation (2) to solve for k and y 0. The answer is threefold. First, the graph tells us whether a linear relationship really holds between the two variables. Second, the graph allows us to easily average over the fluctuations between data points to get a better value for k and y 0. Finally, the plot can be used to determine an approximate experimental uncertainty for these two quantities. This is discussed further in Section VI. IV. Nonlinear Relationships Many times we encounter situations in which the relationship between the variables of interest is not linear. We will now examine two specific important nonlinear situations in detail and also give guidelines for dealing with other cases. IV.A Exponential Relationships Often you will find that a quantity will increase or decrease at a rate in direct proportion to its size. For instance if you invest $1 and receive interest at a continuous rate of 10% per year (i.e. 0.1 yr -1 ), then the rate of change of your total money m as a function of time t is m t = dm dt = 0.1yr 1 m (3) 2 Graphical Analysis & Errors

The result is that we get an exponential growth of m with time: m(t) = $1 e 0.1y r -1 t = $1 e t/10 yr (4) Here "e" is the base of the natural logarithm, 2.718.... Note that at time t=0, Eq. 4 gives m(t=0) = $1, which is what we started out with. As another example, consider the radioactive decay of nuclei. The number of decays per unit time is proportional to the number remaining. If the number of nuclei remaining at a given time t is n(t) and the decay rate for a single nucleus is 0.5 per second, then: n t = dn dt = 0.5sec 1 n (5) Note the important difference between this situation and that of Equation (4). There the invested money grew with time so that the right-hand side of the equation had a positive sign. Here the right-hand side is negative because nuclei are disappearing. The result is an exponential decay: n(t) = n 0 e -0.5 sec -1 t -t/2 sec = n 0 e (6) Of course this represents an average over many samples, so that n(t) does not have to be an integer. The two cases we have discussed are special cases of the general exponential relationship whose form is: y(x) = y 0 e x/x 0 (7) Here y 0 is the value of y at x=0 and therefore has the same dimension as y. Similarly the dimension of x 0 must also be the same as x since the exponential must be raised to a pure number. This x 0 is a characteristic parameter of the relationship. If x increases by x 0, then y(x) increases by exactly a factor of e. Therefore x 0 is often called the e-fold constant (in special situations it might also be called the time constant or the length constant). Often people will instead discuss the half-life (i.e. the 2-fold time) or the 10-fold constant of an exponential relationship instead of the e-fold constant. The three are closely related, however, since we can easily change the base of the exponential in Eq. (7): y(x) = y 0 e x/x 0 = y 0 10 loge x/ x 0 = y 0 10 x/2.303x 0 (8) 3 Graphical Analysis & Errors

= y 0 2 x / 0.693 x 0 Thus the 10-fold parameter is 2.303 times the e-fold parameter and the 2-fold parameter is 0.693 times the e-fold parameter. Often you will encounter situations where you believe that the relationship between two variables may be exponential but are unsure. To verify that the relationship is exponential and to find the parameters y 0 and x 0 of the exponential relationship, it is convenient to use a graph. However, a plot on linear graph paper does not give a straight line for an exponential relationship. Nonetheless, there is a way to plot an exponential relationship to get a straight line. If we take the logarithm of the third line of Equation (8), we get: logy = logy 0 + x 2.303x 0 (9) Thus, while y(x) is not linear in x, logy is linear in x. THIS IS THE KEY! A plot of logy as a function of x yields a straight line on linear graph paper if the relationship between the variables is exponential. Moreover, to make life easier, there is semi-log graph paper in which the logarithm of the y variable is plotted automatically on the vertical axis. On semi-log paper the distance between tick marks on the y-axis is proportional to the logarithm of the ratio of the numbers. The y-axis is divided into repeating cycles of powers of 10. When using semi-log paper, you must first decide which power of 10 a given cycle represents. Note that the small vertical divisions have different values, depending on where they occur within a cycle. Once a semi-log plot has been made, a best-fit straight line can be drawn through the points (assuming of course that the relationship between the variables really is exponential so that the data points lie more or less on a single line). As Equation (9) shows, the slope of the line yields x 0. If the change in x required for a 10-fold increase in y is denoted x, then the change in x required for an e-fold increase in y is only x/2.303 (see the discussion around Equation (8) if you don't understand this). Thus x 0 = x / 2.303. In conclusion, semi-log plots let us verify that the relationship between two variables is exponential and determine the two constants y 0 and x 0. IV.B Power-law Relationships In many situations the relationship between two variables is a power-law: y(x) = a x b (10) For instance, we will see in class that the period T of a planet orbiting the sun varies as the radius of its orbit R to a power b. Thus: T = a R b (11) 4 Graphical Analysis & Errors

Note that, unlike exponential relationships, there is no characteristic scale in power-laws. In order to test whether a relationship is a power-law and determine the parameters a and b in Equation 10, we can use a log-log plot. To see how this works, we take the logarithm of both sides of Equation (10). This yields: log y = b log x + log a (12) Thus, while y is not linear in x, logy is indeed linear in logx. If we computed the logarithms of the points and plotted them, we would get a straight line. To make things easier, we can use loglog graph paper, which automatically plots the logarithms without computation on your part. Each axis of log-log graph paper has the same structure as the logarithmic y-axis on semi-log graph paper. As Equation (12) shows, the slope of a log-log plot is equal to the exponent b. If the log cycle size is the same in x and y then we can obtain this simply by picking two points on the line which are a factor of ten apart in their y-coordinates. Then measure the length of a cycle on the x-axis, Lx (in centimeters or inches), using a ruler and also measure the distance between the two points of the line projected onto the x-axis, x (again in centimeters or inches). Finally divide Lx by x to get the proper slope. That is, b = Lx/ x. Once b is known, the value of the "prefactor" a can be determined by plugging the coordinates of a point on the line into Equation (10) and solving for a. Note that b has no units while a has whatever units are needed to make Equation (10) dimensionally correct. IV.C Other Nonlinear Relationships Sometimes you may encounter nonlinear relationships which are neither power-law nor exponential. If you know the mathematical relationship between the variables, however, you may still be able to devise an approach which will yield a straight-line plot. Suppose for instance, that I drop a ball and it accelerates due to gravity. You will see in class that the velocity of the ball v after falling a distance d is: v 2 = v 0 2 + 2gd (13) Here g is the acceleration due to gravity and v 0 is the initial velocity of the ball. By measuring the velocity of the ball at various distances d, we can determine g. However, if we do not know v 0, then we cannot use any of the graphical methods we have discussed thus far. Equation (13) is not linear, exponential or power-law. How, then can we proceed? The answer is that we must find a pair of variables that are linear. Notice that if we plot the square of the measured velocity, v 2, versus distance d, we have a linear relation with a slope 2g and y-intercept of v 0 2. You should think about this carefully until you understand it fully. To reiterate, the key to analyzing nonlinear relationships which are neither power-law nor exponential is finding two experimental quantities which are linearly related. 5 Graphical Analysis & Errors

V. Summary of Graphical Analysis 1. Graphical analysis is a powerful tool for determining the relationship between two experimental quantities. 2. The key to obtaining quantitative results from graphical analysis is plotting the variables in such a way that a straight line is obtained. 3. If the relationship between the variables is exponential, then a plot using semi-log graph paper should be straight. The slope of the line gives the characteristic constant of the exponential. 4. If the relationship between the variables is a power-law, then a plot using log-log graph paper should be straight. The slope of the line gives the exponent. There is no characteristic parameter in a power-law relationship. 5. If the relationship between the variables is nonlinear but not exponential or power-law, then you must search for another way of plotting the variables which yields a straight line. 6. Finally, you should ALWAYS plot your data as you take it. This allows you to detect errors in your measurements before it is too late to correct them. It also shows where data points are most needed in order to best fix the position and slope of the best-fit line. To find the correct minima and maxima for the plot, it is usually best to perform measurements at the end of the experimental ranges first. This also guarantees that the experimental apparatus is functioning properly over its full range. VI. Error Analysis VI.A Propagation of Errors An important part of any experiment is evaluating the uncertainty in the final results. Conceptually, this process can be divided into two stages: (1) Determining the uncertainty of the initial data; and (2) Propagating this uncertainty into the final result calculated from that data. The first process varies widely depending on the type of measurement being performed. Usually uncertainties in readings can be assigned based on a little experience and common sense. The second part of the process --propagating the initial uncertainty in the data through the various stages of data analysis and into the final result -- is a bit better codified. Suppose for instance that we have a result z which is the sine of an angle θ that has an uncertainty θ, i.e. z=sinθ. What is the resulting uncertainty in z? Differential calculus tells us that the change in sinθ with a small change in θ is cosθ θ, where θ is measured in radians (this is very important!). Thus the uncertainty on z is z = cosθ θ. As another case, suppose z is the sum of two numbers, x and y, which have uncertainties x and y respectively, i.e. z=x+y. What is the resulting uncertainty of z? At first we might guess that z= x+ y. However, if the actual errors of x and y have opposite signs, then they will partly cancel each other out and the error in z may be quite small. Thus a simple summation of the two uncertainties x and y is too pessimistic because it neglects possible cancellation of 6 Graphical Analysis & Errors

the two errors. It turns out that the correct statistical uncertainty is given by z 2 = x 2 + y 2. Attached to this laboratory description is a page containing the equations governing the propagation of errors in other situations. You may wish to tape it into your notebook so that it is handy throughout future lab sessions. If you don't understand any of these equations, you may want to ask your instructor about them. VI.B Errors and Graphs Often we will want to determine the error in parameters derived from a graph. Suppose we call the slope of the "best fit" straight line k 1 and the "best fit" y-axis intercept y 01. We now draw two lines of larger and smaller slope which still approximately go through the data points. Then we find the slopes k 2 and k 3 and the intercepts y 02 and y 03 of these alternate lines. The differences between these values and the optimal values k 1 and y 01 give the uncertainties in these quantities. Thus we can say that the slope is b = k 1 ± (k 2 -k 3 )/2 and the y-intercept is y 0 = y 01 ± (y 02 -y 03 )/2. Similar approaches work to estimate the uncertainties in parameters derived from semi-log and log-log graphs. 7 Graphical Analysis & Errors

Graphical Analysis and Errors: Exercises 1. Graph the data you obtained in the first pre-lab exercise on a linear scale to determine the relationship between position (x) and time (t). Position should be on the y-axis and time on the x-axis. State your result completely, including units. Using your graph, estimate the uncertainties on your slope and y-intercept. Enter the data into the Graphical Analysis program on the computer. Carry out a linear fit. Compare your results above to the results obtained using the computer program. 2. Take the results you obtained in the second pre-lab exercise, and enter those into the Graphical Analysis program. Plotting the force F versus distance, you should see that there is not a linear relationship. Your goal is to find one. Set up two new columns in Graphical Analysis, one for log F and one for log r. To do this, go to the Data menu, select New Column, and then Calculated. Enter a name, hit the log button, and then select F or r from the Columns pull-down menu. Try plotting log F vs. r, F vs. log r, and log F vs. log r. Only one of these should look like a straight line which is it? Do a linear fit to obtain an equation relating F and r. Using your work with Graphical Analysis to guide you, plot the data yourself on either semi-log or log-log paper, whichever is appropriate. Determine the relationship between force and distance from your graph, and compare it to what you got with the software. 3. Your TF will provide the data for this part, based on what everyone entered on WebCT for the experiment with the pennies. Enter the data into Graphical Analysis. The graph of N, the number of pennies remaining, versus t, the number of throws, should not be linear. As you did in step 2, set up columns for log N and log t, and determine which combination produces a linear relationship. Fit the data to determine the relationship between N and t. Plot the data on either semi-log or log-log paper, whichever is appropriate. Determine the relationship between N and t from your graph, and compare it to what you got with the software. 4. Suppose you know that F = m a, where F is the force, m is the mass and a is the acceleration. If you measure m = 2.0 ± 0.1 kg and a = 5.1 ± 0.2 m /s 2, what is the value and uncertainty of F? 5. Suppose we have a relationship y = a + sinθ. You measure a = 2.3 ± 0.2 and θ = 10.0 ± 1.0 degrees. What is the value and uncertainty of y? 8 Graphical Analysis & Errors

Error Propagation Below we give the equations governing the propagation of errors when combining two measurements, x ± x and y ± y, or when taking a trigonometric function of an angle θ ± θ where θ is in RADIANS. 1. Addition: x + y = z z = ( x) 2 + ( y) 2 2. Subtraction: x - y = z z = ( x) 2 + ( y) 2 3. Multiplication: xy = z z = xy ( x x )2 + ( y y )2 4. Division: x/y = z z = x y ( x x )2 + ( y y )2 5. Exponentiation: x n = z z = nx n 1 x 6. Sine function: sin θ = z z = (cos θ) θ θ in RADIANS 7. Cosine function: cos θ = z z = (sin θ) θ θ in RADIANS 8. Tangent function: tan θ = z z = (sec 2 θ) θ θ in RADIANS 9 Graphical Analysis & Errors