LABORATORY NUMBER 9 STATISTICAL ANALYSIS OF DATA 1.0 INTRODUCTION The purpose of this laboratory is to introduce the student to the use of statistics to analyze data. Using the data acquisition system (DAS), the student will take data for two different experiments, then use a statistical analysis program and a spreadsheet program for analysis. In the first experiment, a tank of air will be discharged and the internal pressure and temperature measured. We will then correlate the temperature with the pressure. The purpose of this experiment is that it represents a simple real world situation and the correlation can be expected to be good but some data scatter can be expected. In this case, we know the expected outcome based on thermodynamics theory but this kind of experiment could be used in the other direction -- i.e., to deduce the theory from experimental measurements. In the second experiment, the voltage of a capacitor draining to ground through a resistance will be measured as a function of time. Resistors, capacitors and inductors are close to ideal devices and since we are measuring only voltages, we would expect very little data scatter and we expect very good correlation of the data. In this case, theory tells us that the natural log of the voltage should be a linear function of time. 2.0 BLOWDOWN OF AN AIR TANK 2.1 EXPERIMENTAL EQUIPMENT The apparatus, sketched in Figure 9.1 for the first experiment consists of a small compressed air tank, a pressure transducer and a battery power supply, a bourdon type pressure gage and a thermocouple to measure the tank air temperature relative to a reference thermocouple placed in an ice bath (or alternatively, an electronic ice reference simulator). 2.2 BLOWDOWN THEORY Figure 9.1 Blowdown Experiment If T i and P i are the initial values of temperature and pressure respectively, then thermodynamics predicts that the temperature dependence on pressure is: ( ) ( ) where n is called the "polytropic" exponent. If the process is reversible and adiabatic, then n is equal to k, the specific heat ratio which for air has a value of 1.4. This process is not adiabatic and it is expected that n will differ from 1.4. If we take the logarithm of both sides of the above equation, we would expect the following 9.1
linear relationship: ( ) [ ( ) ] ( ) Taking the logarithms of our data, we can see how well they agree with this prediction and find the value of n for our experiment. The tank should be initially charged to a pressure of ~100 psi. The instructor will instruct you on how to charge the tank. Prepare an ice bath for the reference thermocouple. (Alternatively, you may be supplied with an electronic ice reference. This is a device which simulates the voltage of a thermocouple at 0 C.) Turn on the DAS and when Windows has loaded, start LabVIEW and load the program Lab9_blowdown.vi from the folder entitled ENGR 300 Labs located on the Desktop. When you are ready to collect data, open the valve on the tank about 1/2 turn to start the transient and then press the run button ( ) on the LabVIEW toolbar. Open the valve the rest of the way until it is wide open. Data will be saved to the Desktop to a file titled Lab9_blowdown_data.xls where the first column is the pressure transducer output in kpa (absolute pressure) and the second column is the thermocouple output (in Kelvin). 3.0 DISCHARGE OF A CAPACITOR 3.1 EXPERIMENTAL EQUIPMENT This experiment consists of a capacitor connected to ground through a resistor as shown in Figure 9.2. 3.2 EXPERIMENTAL PROCEDURE Initially, the battery is used to charge the capacitor. At the start of the measurements, a momentary switch connecting the charging battery is released and the battery voltage then decays in an exponential manner. The value of the capacitor and the resistor are shown on the apparatus and you should be able to predict the resulting voltage transient. The purpose of this experiment however is not to demonstrate the validity of the theory but to show that statistical analysis through the correlation coefficient can demonstrate that there is a functional relationship between voltage of the capacitor and time. To collect data, open the file Lab9_C_discharge.vi in LabVIEW. Make sure that the output of the metal project box is connected to analog input 0 on the DAS input box. Hold down the momentary switch on the metal project box, press the run button ( ) on the LabVIEW toolbar, then release the momentary switch. The LabVIEW VI will save data to the Desktop to the file Lab9_data.xls, with the first column representing the time in seconds and the second column representing the output voltage (in volts). 9.2
4.0 STATISTICAL EVALUATION OF YOUR RESULTS Although you have a large number of data points, you will find that you can use a spreadsheet program to perform the evaluations quite easily. There is an attached data reduction procedure which explains how to reduce your data using a spreadsheet program. You may use this procedure, or you may also use other software applications such as Matlab or modify the LabVIEW VI accordingly (if you do the latter, save the modified VI using a different filename!). The results you are to produce for the first experiment are: 1. The linear correlation coefficient for absolute temperature versus absolute pressure 2. The linear correlation coefficient for ln(t) versus ln(p) 3. The coefficients for the best fit straight line for the ln(t) versus ln(p) data 4. A plot showing the T vs P data (as a series of lines connecting the data points but not showing the data points themselves) and the best fit straight line. A similar plot for ln(t) vs ln(p) For the second experiment the required results are: 1. The linear correlation coefficient for voltage versus time 2. The linear correlation coefficient for ln(voltage) versus time 3. The coefficients for a least squares fit line of voltage versus time 4. The coefficients for a least squares fit line of ln(voltage) versus time 5. A plot of the data and fitted line for voltage versus time 6. A plot of the data and fitted line for ln(voltage) versus time 9.3
5.0 DATA REDUCTION PROCEDURE One purpose of Experiment 9 is to demonstrate how a data acquisition system can be used to take and statistically analyze large volumes of data. The following describes a procedure which can be used to do this for each of the two experiments performed for Experiment 9. Note that the LabVIEW VIs can be modified to carry out the statistical data analysis after the data is collected. This has not been done so that the student can practice carrying out the data analysis procedure. An outline of the procedure using Microsoft Excel is as follows: 1. Take the data using LabVIEW VI 2. Import the data into EXCEL. 3. Use EXCEL to calculate the required logarithms 4. Use EXCEL to find the correlation coefficients and least squares lines. 5. Create columns of the require least square fits to the data 6. Create the required plots. STATISTICAL ANALYSIS USING EXCEL The following material explains how functions in Excel can be used to compute the correlation coefficient for two columns of data and to compute the least squares best fit line for two columns of data. Consider that values of the independent variable (x) are contained in a column, for example B2:B20 and that the values of the dependent variable are contained in a separate column, for example D2:D20. To find the correlation coefficient between the y and the x variables, perform the following steps: 1. Find a clear block on the spreadsheet at least 3 rows high and 2 columns wide. 2. In the top leftmost cell of this block type Rxy= 3. In the cell to the right of the Rxy= label cell, type =CORREL(D2:D20,B2:B20) and then Return. 4. The correlation coefficient for the two data columns should appear. To find the coefficients of the best fit straight line y = ax + b, perform the following steps: 1. Locate the cursor below the cell labeled Rxy and type the title a= 2. In the cell to the right of the a= label cell, type =SLOPE(D2:D20,B2:B20) and then Return. The value of the slope a will appear. 3. In the cell below a= type b= 4. In the cell to the right of the b= cell type =INTERCEPT(D2:D20,B2:B20) and then Return. The value of the intecept b will appear. 9.4
Note: For the functions SLOPE and INTERCEPT, the block for the y values as the first argument and the block for the x values appears second. Alternatively you can find the correlation coefficient and the least squares line using the plotting functions of Excel, or even the statistical functions of Excel s Analysis ToolPak. Note however that the Analysis ToolPak is an optional installation item and may or may not be installed on your computer. 6.0 THE REPORT This lab is to be reported in a format specified by the instructor. What differences did you observe in the character of the results for the two experiments? Can you explain any irregularity in the data for the blowdown experiment? 9.5
SUPPLEMENTARY INFORMATION FOR LABORATORY 9 STATISTICAL ANALYSIS OF DATA 1.0 INTRODUCTION TO STATISTICAL ANALYSIS OF DATA Engineering experiments can be performed for a variety of reasons. One is to simply determine if an engineered system performs according to its design specifications. In this case, the test data is simply compared to the desired or expected performance. In another kind of experiment, data is taken to confirm a preexisting theory and the test data is compared to the theory (or possible a revised theory). In other more exploratory test, the functional behavior of the test data is unknown in advance and it is desired to establish a non- theory based functional relationship. This is known as data correlation. Correlation may also be used in cases where the exact functional relationship is unknown but there is some knowledge about the underlying theory. Two important techniques in correlating data are the Correlation Coefficient and Regression Analysis of which the simplest method is the Least Squares Linear Fit. 2.0 THE CORRELATION COEFFICIENT There are several common statistical techniques which can be used to advantage to correlate data. One of these is to establish the correlation coefficient for a set of experimental variables. The correlation coefficient is a number whose magnitude can be used to determine if there in fact exists a functional relationship between two measured variables. For example, one might expect a very weak correlation between exam scores and the height of the student. On the other hand, we might expect a fairly strong correlation between the total electric power delivered by PG&E and the time of the day. The correlation coefficient is a number which can be used to measure the degree of correlation between two variables. If we have two variables x and y and our experiment yields a set of n data points (xi, yi), i=1, n, then we can compute the linear correlation coefficient from: ( )( ) [ ( ) ( ) ] where x and y are the mean values of x and y obtained experimentally and are given by: r xy should lie in the range from -1 to +1. A value of +1 would indicate a perfectly linear relationship between the variables with a positive slope, i.e. increasing x results in increasing y. A value of -1 indicates a perfectly linear relationship with negative slope, i.e. increasing x produces decreasing y. A value of zero indicates there is no linear correlation between the variables. 9.6
In fact, there is probably no correlation if the absolute value of r xy is simply a small number. Ordinary random variations in the measurements will usually result in a nonzero value of r xy even if there is no relationship between them at all. Based on statistical theory, values of r have been established to which r xy can be compared in order to determine if there is a significant correlation between the two variables. For two variables, n data pairs and assuming that the correlation coefficient could be positive or negative, the appropriate values of r for significance are given in the attached table. It is a function of the number of samples and the confidence interval. For common engineering purposes, the confidence interval is usually taken as 95% which corresponds to a value of of 5%. For a given set of data, we get r t from the table and compare it to the value of r xy computed from the data. If r xy > r t, then we can presume that y does depend on x in a non-random manner and can expect a linear relationship will offer some approximation of the true functional relationship between x and y. A very low value of r xy does not necessarily mean there is no correlation - the functional relationship might be very non-linear - a circle for example. 3.0 THE LEAST SQUARES LINEAR FIT We now go on to a method to fit a function to the data. The simplest function is a straight line of the form y = ax + b. If we only have two pairs of data, the solution is simple since the points completely determine the straight line. If there are more points, then we want to determine a "best fit" to the data. The most common method for finding this best fit is the method of least squares. The test data consist of data pairs (xi, yi). For each value of x i, we can predict a value of y according to the linear relationship y = ax + b. These predicted values are called Yi. For each value of xi, we then have an error e i = (Y i - y i ) and e i 2 = (Yi - yi) 2 = (axi + b - y i ) 2 The sum of the squared errors for all the data points is then: ( ) We now choose a and b to minimize S by differentiating S wrt a and b and setting the results to zero: ( ) ( ) These two equations can be solved simultaneously for a and b. 9.7
The results are: ( ) ( ) The resulting line is called the least squares best fit to the data. This procedure can be applied to functions more complicated that a straight line, a second order parabola for example. Higher order fits are not considered in this experiment. 9.8