Basic Statistical Analysis! Prof. Dan Burkey! CHEG 4137W!! UConn CMBE!! Fall 011! 1
Why do we care? We can only make a finite number of measurements of a system Statistical analysis tells us something about the quality of the data we obtain. Error analysis and propagation of error let us talk about confidence in our numbers, both measured and calculated.
Terminology Statistical analysis has its own language A distinct branch of mathematics There are entire, extremely detailed courses in probability and statistics. This lecture is designed to be an introduction to statistical analysis. Think of it as the wading pool - get your feet wet! 3
The what now? With the who? Accuracy vs. Precision Accuracy: How close a series of measured values are to the actual or true value. Precision: (aka reproducibility or repeatability) The degree to which repeated measurements made under identical or unchanged conditions yield the same result. Measurements you make can be accurate but imprecise, precise but inaccurate, neither, or both 4
Accuracy and Precision Precise Not Accurate Value Accurate Not Precise Accurate and Precise True value Not Precise Not Accurate Number of observations 5
Let s Start Easy Average We all know this one! 1 N N " i=1 n i In Excel: =average(n 1,n,n 3, n N ) 6
Moving On Variance Sounds like what it is! " = 1 N N $ (x # µ) µ = 1 i N i=1 N " x i i=1 The more scatter, or variability, in your data, the greater the variance will be In Excel: varp(n 1,n,n 3, n N )! 7
Population vs. Sample Population: All possible measurements N in an experiment. Typically we can never sample the entire population - too time consuming! Sample: A subset, n, of the population that we use as representative of the entire population Sample size is important! If n is too small, it may not accurately represent the population. 8
Sample Variance Sample Variance Accounts for using a subset of the population n s = 1 #(x " x ) x = 1 n "1 i n i=1 n " x i i=1 (n-1) vs. N in the denominator - i.e. there is more variance in the sample than the entire population Why? In Excel: var(n 1,n,n 3, n N )! 9
Standard Deviation Standard Deviation May be familiar with this one as well s = s = n 1 #(x " x ) x = 1 n "1 i n i=1 n " x i i=1 Standard deviation is useful for predicting the confidence interval In Excel: stdev(n 1,n,n 3, n N )! 10
Standard Deviation Standard Deviation gives an idea of the probability that a measured value will be a certain distance from the mean or expected value - for normally distributed data! Number of Measurements The more measurements taken, the closer this histogram approaches the a true normal distribution. Measured Value 11
Standard Deviation The normal distribution is often used to describe, at least approximately, any variable that tends to cluster around a mean.! PDF = 1 % (x $ x ( ) exp ' $ "# & # * ) 1
Confidence Intervals A confidence interval is simply the probability that a measured value falls within a certain range of the mean or average value.! For normally distributed data, we expect ~95% of the measured values to be within standard deviations of the mean. For normally distributed data, we expect ~68% of the measured values to be within 1 standard deviation of the mean. 3σ = 99.6%" 4σ = 99.8% 13
Confidence Intervals Calculating a Confidence Interval: In Excel: confidence(alpha, stdev, n)" Alpha = confidence level. i.e. 0.05 = 95%" stdev = standard deviation for the data" n = sample size" Example: Everyone take out your wallet and driver s license." Tell me your height in inches.! 14
Confidence Intervals Student s T-test What happens when your sample size is small compared to the population?" The standard deviation as calculated from 10 data points will be different than that calculated from, say 1000 data points." TheT-test can be thought of as a correction which accounts for the uncertainty that arises from only sampling a small subset of the population! 15
Student T-test In Excel: =tinv(probability,dof)! Probability is the interval of interest:" i.e. 95% = 0.05" DOF = Degrees of Freedom" n - 1; That is, one less than the number of measurements." The number returned is the modifier for confidence interval." e.g. 95% with a large number of samples: ~σ" e.g. 95% with 10 samples:.6σ e.g. 95% with 3 samples: 4.3σ 16
Propagation of Error If we make a number of observations or measurements, and then perform a series of calculations based on those measurements, how do the individual errors/variations/uncertainties in my measured quantities effect (propagate through to) the calculated quantity?" Need a way to mathematically describe this phenomenon " 17
Example: Reynolds Number Say you want to calculate the Reynolds number for flow through a pipe: Re = "VD µ Note: You didn t measure V, you measured a mass of fluid collected over time. 18
Example: Reynolds Number V = 4m "t#d Re = 4m "tdµ Sources of uncertainty or error:" Mass: Say you collected 5 kg. The scale is accurate to 0.5 kg. Did you get exactly 5 kg? Was it more or less?" Time: Collection took 90 seconds. Your stopwatch is accurate to 1 second. Did it take exactly 90 s?" Diameter: The pipe was manufactured to a specified tolerance: 1/ Schedule 80: ID is 0.546 ± 0.00 " Viscosity: What is this a function of? π and 4 can be considered exact. 19
Propagation of Error In general, for a function Z = F(A,B):" "Z = $ #F ' & ) %#A( ("A) $ + & #F %#B ' ) ( ("B)... Can be extended to arbitrary number of variables." 0
Propagation of Error Example Z = F(A,B) = A + B" Thus:" "F "A =1 "F "B =1 "Z = ("A) + ("B) 1
Propagation of Error Some Common Forms:" Z = A + B : "Z = "A ( ) + ("B) Z = A # B : "Z = ("A) + ("B) Z = AB : "Z Z = $ "A' & ) % A ( $ + & "B % B ' ) ( Z = A B : "Z Z = $ "A' & ) % A ( Z = A n : "Z Z = n "A A $ + "B ' & ) % B ( Z = ln(a) : "Z = "A A Z = exp(a) : "Z Z = "A
Example: Reynolds Number Thus: Re = 4m "tdµ " Re Re = # "m& % ( $ m ' # + "t & % ( $ t ' # + % "D $ D & ( ' # + "µ & % ( $ µ ' Exercise: Prove this is the right form from the derivatives 3
Example: Reynolds Number " Re Re = # "m& % ( $ m ' # + "t & % ( $ t ' # + % "D $ D & ( ' # + "µ & % ( $ µ ' Sources of uncertainty or error:" Mass: Δm = 0.5 kg; m = 5 kg." Time: Δt = 1 sec; t = 90 sec." Diameter: ΔD = 0.00 in.; D = 0.546 in." Viscosity: F(T,P) = maybe assume constant, if conditions unchanged over the course of the measurements? " 8.68E-4 kg m -1 s -1 at 5 C" 4
Example: Reynolds Number " Re Re = # 0.5& % ( $ 5 ' # + 1 & % ( $ 90' # +.00 & % ( $ 0.546' # 0 + % $ 8.63x10-4 & ( ' And:" Re = 4m "tdµ = 4(5kg) (0.054 m) "(90s)(0.546in) 1in (8.68x10-4 kgm #1 s #1 ) Re = 4m "tdµ = 9380 = 9000 (sig figs) 5
Example: Reynolds Number " Re = Re # % $ 0.5 5 & ( ' # + 1 & % ( $ 90' # +.00 & % ( $ 0.546' # 0 + % $ 8.63x10-4 & ( ' " Re = 67 =1000 (sig figs) Thus, we would report Re = 9,000 ± 1000 6