Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly tha large oes. A cotiuous probability distributio, which is valuable for represetig this situatio is the Gaussia or Normal Distributio. It is give by Module Fudametals i statistics p( y) e ( y ) where is the mea, ad is the variace., y Figure: Normal distributio with differet meas ad variaces Deoted by N(, ) Characterizig a Normal Distributio Oce the mea ad the variace of a ormal distributio are give, the etire distributio is characterized. Area uder a ormal distributio=. Figure: Tail areas of the ormal distributio Stadard Normal Distributio If y is a ormally distributed variable, the it ca be expressed i terms of a stadardized ormal deviate or a uit ormal deviate, y z The distributio of z is N(,). Tables for uit ormal deviate are give i books (refer to Table i hadout). 3 4
Usig Tables of Normal Distributio Example Suppose the daily level of a impurity i a reactor feed is kow to be approximately ormally distributed with a mea of = 4. ad a stadard deviatio of =.3. What is the probability that the impurity level o a radomly chose day will exceed 4.4? Pr( y ( y 4. ) 4.4) = Pr( y 4. 4 44. 4. ) = Pr 3. 3. From the Table of uit ormal distributio we fid that Pr( z 33. ) =.98, i.e. there is 9% probability that the impurity level o a radomly chose day will exceed 4.4. Or Out of radomly chose days, there will be 9 days whe the impurity level will exceed 4.4. Studet s t distributio If the mea ad the stadard deviatio are kow the oe ca compute the uit ormal deviate ad use the uit ormal distributio tables. If the mea ad the stadard deviatios are ukow what ca we say about the occurrece of a evet? 6 t distributio or Studet s distributio cotd Usig disttool i Matlab Example Suppose that s.3 ad y 4., what ca oe say about the radom occurrece of impurity level of 4.4? Sice is ukow we caot use the uit ormal tables. Istead defie the term: ( y y) t s that has a distributio kow as the t distributio. This distributio is similar to the ormal distributio for large sample sizes ( i.e. >3) The Pr(t >.33)=. where there are 6 dof associated with s (i.e. there are oly 7 observatios from which s is calculated). (see table i hadout).8.6.4. -8-6 -4-4 6 8 8
t distributio for various degrees of freedom William Gosset lived from 876 to 937 Gosset iveted the t -test to hadle small samples for quality cotrol i brewig. He wrote uder the ame "Studet". Fid out more at: http://www-history.mcs.st-adrews.ac.uk/history/ Mathematicias/Gosset.html 9 For distributio of averages i t ca be show that The quatity has a t-distributio with - DOF. Note that the shape of the distributio depeds o the DOF -. Specificatio of the DOF is therefore importat. The PDF of t-distributios is symmetric ad bell-shaped like the ormal distributio. However, the spread is more tha that of the stadard ormal distributio. Area uder t-desity fuctio from - to +; DOF:.896 (t-distributio) = (t-distributio) = - - 3 >> cdf( T,,) =.896 Lear to get this from t-tables (iterpolatio may be eeded) The PDF of t-distributio teds to that of the N(,) distributio as the sample size icreases. >> cdf('t',:,3).837345698699.97687477585 >> cdf('normal',:,,).843447466854.9774986858
Example 3 Distributio Based o the data collected from families residig i a area of a city i the last year, test the hypothesis that the average icome of a family per year is 75, $. H: H: (i) Simulate data samples by X = 5 + 5*rad(,); Xm = mea(x); S = std(x); (ii) With 95% level of cofidece, if the we accept H. Otherwise, we reject H. 3 Q? : What is the distributio of s whe observatios are draw radomly from a ormal distributio? The distributio provides a distributio of the sample variace, i.e. it is the distributio of the scaled quatity: ( x i x) or equivaletly s ~ ~ ( ) read last term as: s is distributed as with (-) dof ad a scale factor of /( ) Chi-squared Distributio (i) Let x i s be idepedet samples draw from N(,). The, the sum.5..5 f() =3 =5 is distributed as with degrees of freedom Mea (chi-sq) = Variace (chi-sq) (ii) Let x i s be idepedet samples draw from N(, ). The, Useful for variace related statistical aalysis 5..5 = = 5 5 5 3 Ulike the ormal ad t-distributios, the -distributio is ot symmetric. As icreases, the distributio teds to be more symmetric For = 3, the -distributio teds to a ormal distributio with mea ad variace. 6
Area uder the desity curve betwee ad.645 with 6 DOF =.9 For what value of x is the area uder the desity curve betwee ad x equal to.9? Let DOF = 6. >> cdf( chi,.645,6) as =.9 >> chiiv(.9,6) as =.6446 Lear to obtai these values from the distributio Tables. Also uderstad the upper ad lower limits, ad, respectively, as depicted below. Q? : What is the distributio of whe observatios are draw radomly from a ormal distributio? The distributio provides a distributio of the sample variace, i.e. it is the distributio of the quatity: or equivaletly, where s is the sample variace from a radom sample of observatios from a ormal distributio with ukow variace. Withi a ( ) % cofidece iterval, ( ) s / ( ) / ( ) which gives the cofidece iterval for as follows ( ) s ( ) / ( ) s ( ) / 7 8 Example 4 Assume that we have estimated the variace of a certai product characteristic to be 3 (i.e. s = 3) from = 36 observatios. The samples are assumed to come from a ormal populatio with ukow. The task is to compute a 95% cofidece iterval for. ( ) s PA B.95 ( ) s P B ( ) s A.95 Thus the required cofidece limits are: ad, i.e. 8.56.. Example 5 The data i Example 4 come from a ormal distributio with variace. Test the hypothesis that H: = 5. H: 5. Two tailed test: Look for.5% area o both tails of the distributio..5% area o the left occurs at.57, ad.5% area o the right remais at.833. Therefore, A =.57 ad B =53.. Sice but larger tha which is less tha chiiv(.5,35) =.57. chiiv(.975,35) = 53.. 9 we accept the ull hypothesis.
F Distributio Suppose that a sample of observatios is radomly draw from a ormal distributio havig variace, a secod sample of observatios draw from a secod ormal distributio havig variace. The what ca we say about s / s : As: s ~ F, s where = -, ad = -, are the degrees of freedom. F Distributio Let (m) ad () be idepedet variables distributed as chi-squared with m DOF ad DOF. ( ) The ratio is distributed as a F-distributio over the ( m) m domai [, ) with [, m] DOF. fiv(.95,4,) = 3.59 fiv(.95,,) =.967 cdf( f,3.6,4,) =.95 cdf( f,.93,,) =.95 Useful for comparig variaces of two sets of RVs If N(, ) ad N(, ), the where As a result, The (-cofidece iterval for is H: =.. H: Two radom samples of = ad m = observatio are take, ad the sample variaces are Uder H, the test statistic is Example 6 A chemical egieer is ivestigatig the iheret variability of two types of equipmet. He suspects that the old equipmet, Type, has a differet variace from the ew oe. Hece, he wishes to test the hypothesis: s = 4.5 ad s =.8. Or 3 ad larger tha ca be accepted. Hece, the ull hypothesis 4
Cetral Limit Theorem Overall error is usually a aggregate of umber of compoet errors. a a... follows a ormal distributio as the umber of compoets become large, irrespective of the idividual distributios of the compoets. A importat provisio here is that several of the sources of errors must make importat cotributios to the overall error ad, i particular that o sigle source of error domiates all the rest. a 5 Take radom samples. Fid average. Repeat this procedure may times ad make a distributio of these samples. Cetral Limit Tedecy for Averages (slide ) Paret Distributio Sampled Distributio.4.3.. -3 - - 3.4.3.. -3 - - 3 Sampled distributio is more early ormal tha paret distributio. Variace of radomly sampled distributio is smaller tha paret distributio. 6 Cetral Limit Theorem (cot.) Cetral Limit Tedecy for Averages: Dice Example If the samplig is radom (so that the errors are idepedet ad ucorrelated), the we have the simple rule that y varies about the populatio mea ad variace. Thus E( y), V ( y) Where is the umber of radom samples collected at each time. 7 8
Example: Cetral Limit Tedecy for Averages Two dice are throw. Average is computed. Plot histogram Die: >> b=ceil(rad(,)*6); >> c=mea(b'); >> =uique(c); >> hist(c,) 3 4 5 6 >> b=ceil(rad(,)*6); >> hist(b,::6) 9 3 Three dice are throw. Average is computed. Plot histogram >> b=ceil(rad(,3)*6); >> c=mea(b'); Dice Average >> b=ceil(rad(,)*6); >> c=mea(b'); >> =uique(c); >> hist(c,) >> =uique(c); >> hist(c,) 3 3
>> b=ceil(rad(,)*6); >> c=mea(b'); >> =uique(c); >> N=histc(c,); >> N=N/; >> bar(n,); Eve though the.8 idividual observatios.7 do ot come.6 from a ormal distributio,.5 the distributio of the average.4 value teds to.3 be ormally distributed.. (eve if averages of five idividual. observatios are cosidered).5.5 3 3.5 4 4.5 5 5.5 6 33 The cetral limit theorem explais that o matter what distributio a RV follows, the sum or mea of this RV ted to be close to the ormal distributio. Let X, X, be the idepedetly, idetically distributed (i.i.d.) RVs havig mea ad fiite variace. The Or where The Cetral Limit Theorem 34