Statstcs Chapter 4 "There are three knds of les: les, damned les, and statstcs." Benjamn Dsrael, 1895 (Brtsh statesman) Gaussan Dstrbuton, 4-1 If a measurement s repeated many tmes a statstcal treatment of the data can provde an ndcaton of the relablty of the results. If the errors assocated wth the measurement are completely random then the Central Lmt Theorem assures that the data wll follow a mathematcal form called a Gaussan dstrbuton (bell-shaped curve). In the lmt of an nfnte number of measurements the hstogram below becomes the populaton dstrbuton denoted by the sold lne. The hstogram on the rght was drawn to have the same mean, standard devaton, and area as the smooth curve. Ths s not true for a fnte number of measurements. Only n the nfnte lmt wll they bethe same. The populaton dstrbuton (nfnte n) scharacterzed by a populaton mean (average - center of the symmetrc dstrbuton), µ, and a populaton standard devaton (measure of the wdth of the dstrbuton), σ. Another useful measure s the square of the standard devaton, σ 2,known as the varance. µ = lm 1 n n n Σ x =1 σ 2 = lm 1 n n n Σ(x µ) 2 =1 AGaussan curve, y Gaussan can be expressed n terms of these varables. y Gaussan = (x µ) 1 2 σ 2π e 2σ 2
Obtanng the Probablty -2- As all measurements contan expermental error no result s completely certan. However, by a judcous use of statstcs one can assocate a probablty wth the result. To convert the Gaussan dstrbuton nto a probablty densty a new varable z s ntroduced defned as z = (x µ) /σ. Then the Gaussan curve becomes the normal Gaussan error curve or the z dstrbuton y = f (z) = f (0) = f (1) = 1 2π e z2 /2 1 2π 1 2π e = 0. 3989 = 0. 2419 (x = µ ± σ are nflecton ponts) Integraton of the populaton dstrbuton between two lmts as from x = µ zσ to x = µ + zσ gves the probablty of obtanng a value of x between these lmts. The ntegral lmts that gve 95% of the populaton dstrbuton area (a probablty of 0.95) are 95% confdence lmts. The symmetrc Gaussan dstrbuton requres that these lmts be equdstant from µ. The ntegral of agaussan between fnte lmts s not analytc and the area under the curve sgven ntables. (0z0 below s really the absolute value of z, z ) The sample data s not nfnte! How can one characterze ts relablty? Frst one needs to obtan the mean and standard devaton for fnte data. The fnte dstrbuton s characterzed by a sample mean (sample average), < x >, and a sample standard devaton, s. Agan, another useful measure s the square of the standard devaton s 2 known as the sample varance. The quantty n 1below scalled the degrees of freedom. < x > = Σ x n Σ(x < x >) 2 s = n 1
-3- EX 1. For the data on the bulbs gven nthe hstogram where < x > = 45. 2 hr and s = 94. 2 hr a) What fracton of bulbs s expected to have a lfetme greater than 1005.3 hr? b) What fracton of bulbs s expected to have a lfetme between 798.1 and 901.7 hr? Comparson of Standard Devatons wth F Test, 4-2 To examne whether two standard devatons are statstcally dfferent determne F calculated = s 2 1/s 2 2 where F 1. If F calculated > F table the dfference s sgnfcant and the two measurements are statstcally dfferent. Hypothess testng based upon assumng that the null hypothess s true at a certan level of probablty, generally chosen to be 5%. The hypothess s accepted f the probablty for t beng true s > 5% and rejected f the probablty for t beng true s < 5%. Null hypothess for the F test: two sets of measurements taken from populatons wth the same populaton standard devaton; all dfferences arse from only random varatons n measurement accept: F calculated < F table => standard devatons are not statstcally dfferent reject: F calculated > F table => standard devatons are statstcally dfferent
-4- Confdence Intervals, 4-3 (nferences based on small samples) From a lmted number of measurements one wants an estmate of the uncertanty of the measurement. One can use a confdence nterval confdence nterval = < x > ± ts n whch mples that the true populaton mean wll be found wthn a range of st/ n of the sample mean wth a confdence level (level of certanty) specfed by the partcular t chosen f one were to repeat the n measurements many tmes. Then a 95% confdence nterval would nclude the true populaton mean n 95% of these sets of n measurements. Note that ths mples that the uncertanty n the sample mean s reduced by more measurements by a factor of 1/ n. Null hypothess for the t test: two sets of measurements taken from populatons wth the same mean; all dfferences arse from only random varatons n measurement accept: t calculated < t table => means are not statstcally dfferent reject: t calculated > t table => means are statstcally dfferent EX 2. The percentage of an addtve n gasolne was measured sx tmes wth the followng results: 0.13, 0.12, 0.16, 0.17, 0.20 and 0.11%. Fnd the 90% and 99% confdence ntervals for the percentage of the addtve.
Comparson of Means wth Student s t, 4-4 (always use 95% confdence ntervals) -5- Case 1: Comparson to a Known or Standard Value EX 3. AStandard Reference Materal s certfed to contan 94.6 ppm of an organc contamnant n sol. You analyze the reference compound fve tmes obtanng < x > = 97. 00, s = 1. 66. Do your results dffer from the expected result at the 95% confdence level? Case 2: Comparson of Replcate Measurements (apply F test frst) For ths comparson one frst obtans a pooled standard devaton then uses t to calculate a value of t whch s compared wth t n Student s t. Ifthe calculated t s greater than the 95% confdence level value of t the two replcates are consdered dfferent. For the two sets of data wth n 1 and n 2 measurements wth means < x 1 >and < x 2 >and standard devatons s 1 and s 2 F calculated < F table F calculated > F table t calculated = <x 1 > < x 2 > s pooled n 1n 2 s s pooled = 2 1 (n 1 1) + s 2 2 (n 2 1) n 1 + n 2 2 <x 1 > < x 2 > n 1 + n 2 s 2 1 /n 1 + s 2 2 /n 2 degrees of freedom = (s 2 1/n 1 + s 2 2/n 2 ) 2 (s 2 1 /n 1) 2 n 1 1 + (s2 2 /n 2) 2 n 2 1 EX 4. Atranee n a medcal lab wll be released to work on her own when her results agree wth those of an experenced worker at the 95% confdence level. Consderng the results for blood urea ntrogen analyss gven below, should the tranee be released to work alone? tranee < x > = 14. 5 7 mg/dl s = 0. 5 3 mg/dl n = 6samples experenced worker < x > = 13. 9 5 mg/dl s = 0. 4 2 mg/dl n = 5samples
Case 3: Comparson of Indvdual Dfferences wth Pared t Test -6- Each sample s measured once by each method and the dfferences d,average value of the dfferences < d >, and standard devaton of the dfferences s d determned n order to calculate t. s d = Σ(d < d >) 2 n 1 t calculated = <d > s d n EX 5. The T content (wt%) of fve dfferent ore samples (each wth dfferent T content) was measured by each of two methods. Do the two technques gve results that are sgnfcantly dfferent at the 95% confdence level? Sample Method 1 Method 2 d d < d > (d < d >) 2 A 0.0134 0.0135-0.0001 +0.0006 3. 6 10 7 B 0.0144 0.0156-0.0012-0.0005 2. 5 10 7 C 0.0126 0.0137-0.0011-0.0004 1. 6 10 7 D 0.0125 0.0137-0.0012-0.0005 2. 5 10 7 E 0.0137 0.0136 +0.0001-0.0008 6. 4 10 7 Grubbs Test for Outlers, 4-6 (use 95% confdence) To determne whether a partcular data pont can be excluded based upon ts questonable veracty, form the Grubbs statstc, G G calculated = x questonable < x > s If G calculated > G table then the pont can be excluded wth the chosen confdence level (here 95%). The mean and standard devaton wll need to be recalculated. Hnt: generally do not exclude a data pont unless you are certan that an error occurred n ts measurement. Never exclude more than one pont. Always use a value of G of at least a 95% confdence level. NOTE: For the F, t, and G statstcs f the calculated value s less than the table value the null hypothess s true, you do nothng!
-7- Method of Lnear Least Squares, 4-7 For a set of n data ponts (x, y ) one wants to fnd the "best" straght lne through the data: y y=mx+b Each y devates from the lne d = y y = y (mx + b) where y s the value when x = x.tomnmze the devatons from lnearty rrespectve of ther sgn one consders the square of the devatons x d 2 = (y mx b) 2 = y 2 2mx y + m 2 x 2 + 2mbx 2by + b 2 In the method of least squares one mnmzes the sum of the squares of all the devatons: SSE = Σ d 2 = Σ y 2 2m Σ x y + m 2 Σ x 2 + 2mb Σ x 2b Σ y + The values of m and b are found whch mnmze SSE m SSE = b Σ b 2 b SSE = m
-8- Lnear Regresson Equatons equaton n Harrs varant (n sample spreadsheet) slope, m (4-16) n Σ x y Σ x Σ y n Σ x 2 (Σ x ) 2 Σ(x < x >)(y < y >) Σ(x < x >) 2 y-ntercept, b (4-17) Σ x 2 Σ y Σ x Σ x y n Σ x 2 (Σ x ) 2 Σ y m Σ x n varance of the regresson, s 2 y (standard error) (4-20) Σ(y mx b) 2 n 2 varance of the slope, s 2 m (standard error) (4-21) n n 2 Σ(y mx b) 2 n Σ x 2 (Σ x ) 2 Σ(y mx b) 2 (n 2)Σ(x < x >) 2 varance of the ntercept, s 2 b (standard error) (4-22) Σ(y mx b) 2 Σ x 2 (n 2)[n Σ x 2 (Σ x ) 2 ] Σ x 2 Σ(y mx b) 2 n(n 2)Σ(x < x >) 2 correlaton coeffcent, R (5-2) Σ(x < x >)(y < y >) Σ x y Σ x Σ y /n Σ(x < x >) 2 Σ(y < y >) 2 Σ(x < x >) 2 Σ(y < y >) 2 Abbrevatons used n spreadsheet n addton to SSE (smlar expressons n y) Sx = Σ x SSx = Σ x 2 SSDx = Σ(x < x >) 2 Sxy = Σ x y SDxSDy = Σ(x < x >)(y < y >)
-9- Calbraton Curves, 4-8 standard solutons blank solutons questonable: (0.392) NOTE: Propagaton of uncertanty for a calbraton curve follows Eq. 4-27