Chapter 5 Elemetary Statstcs, Emprcal Probablty Dstrbutos, ad More o Smulato
Cotets Coectg Probablty wth Observatos of Data Sample Mea ad Sample Varace Regresso Techques Emprcal Dstrbuto Fuctos More o Mote Carlo Smulato Statstcal Process Cotrol Covergece of the Sample Mea to the Mea
Coectg Probablty wth Observatos of Data Scope of the chapter I chapter to chapter4: aoms of probablty, varous probablty relatoshps world of mathematcal models we ow deal wth problems of real world sample mea, varace, stadard devato, emprcal dstrbuto of radom data etc.. realm of statstcs
Coectg Probablty wth Observatos of Data (Cot.) Remark:. We wll also cosder estmato theory, ad decso makg based o probablstc cocepts later chapters.. Computer smulato of radom pheomea wll be revsted as well.
Sample Mea ad Sample Varace Suppose we take a sample of maufactured tems {,,, }, e.g. resstors Populato: totalty of tems {,,, } Sample: dvdual 's
Sample Mea ad Sample Varace (Cot.) Defto 5. Sample mea The sample mea of a populato s smply the arthmetc average of all the sample values,.e. Remarks Notce that the sample mea s a radom varable, depedg o the sample values selected from the populato. Hope that t s close to the actual mea: to be dscussed later...
Sample Mea ad Sample Varace (Cot.) Defto 5. Sample varace The sample varace of a populato s defed a smlar maer as the varace of a radom varable dscussed chapter,.e. s ( ) Where s the sample mea defed above.
Sample Mea ad Sample Varace (Cot.) Remarks Notce that s s a radom varable as sample mea s a radom varable It represets the measure of the spread of populato aroud the sample mea The square root of sample varace s called the stadard devato s ( )
Sample Mea ad Sample Varace (Cot.) The reaso for dvso by - rather tha wll become apparet subsequet chapter whe we cosder estmato. Note, for a momet, that as becomes large, the dfferece s small
Sample Mea ad Sample Varace (Cot.) Eample 5. Fd the sample mea of the followg resstace values Ω; 900, 03, 939, 06, 07, 996, 970, 079, 065, ad 049. Soluto Applyg the defto of the sample mea above, we easly get: 009
Sample Mea ad Sample Varace (Cot.) Eample 5. Obta the sample stadard devato of the resstace samples gve the above eample. Soluto Usg the defto of sample stadard devato, we obta s 58. 7
Sample Mea ad Sample Varace (Cot.) Note As the # of populato gets large, computatos volved the sample varace becomes tedous, due to the subtracto procedure... Frst, you calculate the sample mea, ad the subtract t from each sample before squarg ad summg aga
Sample Mea ad Sample Varace (Cot.) Computatoally effcet way of obtag Epadg the RHS of the sample varace, we get applyg the defto of the sample mea above s s ) ( ) ( s
Sample Mea ad Sample Varace (Cot.) Remark: We oly eed the sums of the sample mea ad the squares of the sample mea, wthout the tedous step of subtracto...
Sample Mea ad Sample Varace (Cot.) Eample 5.3 Redo the eample 5- followg the procedure metoed above. Soluto: We get s ad s 0,,806, 58,7, whch s 0,090, ad thus obta the same result obtaed prevously.
Sample Mea ad Sample Varace (Cot.) Eample 5.4 The resduals are defed as: Show that the sum of all resduals s equal to zero. Soluto: From the defto of the sample mea, we have: whch meas d 0 or d 0 ( ) 0
Regresso Techques Suppose we are gve measuremets of a par of data some relatoshp b/w the data 3 matchg the data relato to a smple straght le Objectve: Gve the measuremets of data pars, {(, y ), =,,, }, fd a straght le whch s best ft to the data: 3 Precse determato of the relatoshp s ot possble due to measuremet errors y
Regresso Techques (Cot.) Crtero: Choose the costats α ad β : the straght le ft to the data s take the sese of mmum squared error: (, ) argm o o (, ) where subscrpt o stads for optmum, ad the ε s the average squared error defed as follows: ( y )
Regresso Techques (Cot.) Procedure: (regresso techque) To fd (α o, β o ) makg ε as small as possble, we dfferetate ε wth respect to α ad β, ad put to zero: 0 ) ( 0 ) ( o o o o y y
Regresso Techques (Cot.) We ca re-arrage the above equatos to the followgs: o o o o y y
Regresso Techques (Cot.) Solvg for (α o, β o ) yelds: where represets the sample varace of the values. y s y y o o o ) ( s
Regresso Techques (Cot.) For further smplfcato, defe the sample covarace c y ad the sample correlato coeffcet r y respectvely as: c y ( )( y y) ad where s ad s y represet the sample stadard devato of the data sets { } ad {y }, respectvely. r y s c y s y
Regresso Techques (Cot.) By epadg the product c y, we have: Ad the costat α o ca smply be epressed as: The regresso le whch s best ft to the data becomes: ) ( y y c y y o s c s c y s c y y y o o
Regresso Techques (Cot.) The regresso le ca be arraged to: cy y y ( ) s Or, the regresso le ca be epressed as follows: 4 y y s y 4 Note that ths s a somewhat easer epresso to remember... r y s
Regresso Techques (Cot.) Remarks:. If r y = 0 the regresso le vashes, ad the data set are called ucorrelated.. If r y = ±, the data are learly related as follows: 3. The goodess of the ft s determed by the squared error ε: eve though t has bee mmzed, the regresso le mght stll ot well ft to data, especally whe the correlato b/w/ data s small. y m b
Regresso Techques (Cot.) Eample 5.5 Fd the correlato coeffcet ad the regresso le for the followg data set: Soluto: 0.68 0.7.7.0.63 3.06 3.5 4.00 4.03 4.50 y.45 9.93 6.64 0.4 8.93 3.34.56 6.7 9.6 5.03 From the data set, we ca get: r y = 0.7, =.6, y =.44, ad s = :39, s y = 3:88. Thus, the regresso le becomes: y.44.6 0.7 3.88.39 whch s Fgure5-
Regresso Techques (Cot.) Fgure 5.: Data pars ad regresso le for the radom data set of Eample5-5
Emprcal Dstrbuto Fuctos F Cosder a r.v. X wth dstrbuto fucto whch s ukow. 5 We defe the emprcal cumulatve dstrbuto fucto 6 as follows: Defto 5.3 Emprcal Dstrbuto Fucto: We have a umber of depedet samples of a r.v. X, deoted {, =,,, }. The the emprcal dstrbuto fucto of X s defed as: umber of samples,,, ~ X (,,, ) F X () o greater tha
Emprcal Dstrbuto Fuctos (Cot.) 5 I chapter 3 whe we theoretcally dscussed r.v.'s, we assumed a certa type of dstrbuto fuctos such as Gaussa, epoetal etc., eve though are are ot absolutely sure of t... 6 Or smply emprcal dstrbuto.
Emprcal Dstrbuto Fuctos (Cot.) Eample 5.6 Obta the emprcal dstrbuto of the resstace samples gve Eample5-. Soluto Arragg the samples ascedg order, whch are 900, 939, 970, 996, 03, 07, 049, 06, 065 ad 079, the emprcal dstrbuto ca easly be obtaed va the above defto. The result s plotted Fgure5-.
Emprcal Dstrbuto Fuctos (Cot.) Fgure 5.: The emprcal dstrbuto fucto for resstace samples Eample5-.
Emprcal Dstrbuto Fuctos (Cot.) Note:. The emprcal dstrbuto fucto has the same propertes of a cdf. (check!). The emprcal probablty mass fucto, whch correspods to the pdf, ca easly be obtaed from the dstrbuto fucto. (how?)
Emprcal Dstrbuto Fuctos (Cot.) Smplfed way of computg emprcal dstrbuto: Whe the # of datum s large, we follow the followg steps: () We dvde the data rage to a coveet # of tervals of equal legth. () WE plot a hstogram coutg the umber of data wth each cell (or terval). (3) The # of data wth each cell s cumulatvely summed to get emprcal dstrbuto.
Emprcal Dstrbuto Fuctos (Cot.) Eample 5.7 The tervals b/w telephoe calls arrvg at a certa swtchg offce are recorded mutes, whch are: 0.06, 0.977, 0.05, 0.83, 0.597, 0.46,.37, 0.07, 0.9, 0.938, 0.065, 0.098, 0.7, 0.87, 0.863, 0.0, 0.37, 0.93, 0.343, 0.56, 0.45, 0.637, 0.8, 0.9, 0.4, 0.63, 0.37,.048, 0.5, 0.09,.675, 0.33, 0.06, 0.46,.8, 0.06, 0.04, 0.99, 0.53, 0.376, 0.49, 0.083, 0.575, 0.393, 0.65, 0.009, 0.606, 0.5, 0.83 ad 0.85. Fd the hstogram ad the emprcal cdf of these tme tervals. Soluto: Followg the steps descrbed above, the result are plotted Fgure5-3. Note that the hstogram appears to be roughly epoetally decreasg
Emprcal Dstrbuto Fuctos (Cot.) Fgure 5.3: Emprcal dstrbutos for Eample5-7 (a) hstogram (b) emprcal cdf.
Emprcal Dstrbuto Fuctos (Cot.) Remarks:. Recall the defto of the pdf chapter 3: f X ( ) P( X ) umber of data values (, ) totalumber of data values,
Emprcal Dstrbuto Fuctos (Cot.) Based o ths, we ca fgure out that the hstogram obtaed above eample does ot match the pdf for two reasos: (a) The hstogram s ot ormalzed by the total # of data. (b) As s obvous the above equato,,.e. to get a appromato to the pdf, we must dvde through by Δ Wth these two correctos, replotted hstogram s Fgure5-4, alog wth the plot of the followg fucto: ~ f ( ) e X u( )
Emprcal Dstrbuto Fuctos (Cot.) (cf) The agreemet s surprsgly good 7... 7 The data values Eample5-7 have bee geerated va a radom umber geerator producg epoetally dstrbuted radom varables.
Emprcal Dstrbuto Fuctos (Cot.) Fgure 5.4: Normalzed hstogram to appromate the theoretcal pdf.
Emprcal Dstrbuto Fuctos (Cot.). Notce that there 9 a trade-off b/w # of bs versus the appearace of the hstogram: (a) Too may bs, due to too few samples per b, causes statstcal rregularty, ad thus make the hstogram ragged-lookg. (b) Too few bs causes a low resoluto o the abscssa of the hstogram. (cf) Usually rely o tral ad error process to select the sutable # of bs 8 8 Especally the case of relatvely few data samples, as the case of these eamples...
More o Mote Carlo Smulato Mote Carlo Smulato Comple computer smulatos of systems udergog radom perturbatos, usg radom umber geerator... Eample 5.8 Cosder a RC crcut wth radom resstace ad capactace 9, of whch the 3-dB cutoff frequecy of a RC flter s defed as: f 3 RC Aalyze the possble values of f 3.
More o Mote Carlo Smulato (Cot.) Soluto: Suppose the omal value of R s 000(Ω) w/ ±0% tolerace,.e. R ~ U[900; 00], whereas the capactace has a omal value of (μf) w/ ±5% tolerace,.e. C ~ U[0:95; :05]. 9 Ths s due to the ucerta maufacturg process...
More o Mote Carlo Smulato (Cot.) () Covetoal method: Wthout ay computer smulatos, all we ca deduce from the gve data are the omal value, mamum value, ad the mmum value of f 3,.e.: () omal f 3 : f, om 3 6 0 0 () mamum f 3 : f 500 3, ma 6 9000.950 3 59(Hz) 86(Hz)
More o Mote Carlo Smulato (Cot.) () mmum f 3 : f, m 6 00.050 3 37(Hz) Therefore, all we ca say about f3 s that t would have values of somewhere betwee 7(Hz) ad 86(Hz), wth ts omal value of 59(Hz).
More o Mote Carlo Smulato (Cot.) () Mote Carlo smulato: We geerate, say, 5000 values of R ad C whch are uformly dstrbuted wth ther allowed rage about the omal values, ad compute f 3 for each par of R ad C (usually usg hgh level programmg laguages C++, FORTRAN etc., or utlzg mathematcs package : MATLAB), ad the plot the hstogram of the cutoff frequecy... (fgure5-5)
More o Mote Carlo Smulato (Cot.) Fgure 5.5: Hstograms for (a) resstace, (b) capactace, ad (c) cutoff frequecy.
More o Mote Carlo Smulato (Cot.) Table5. MATLAB program for geeratg hstogram
More o Mote Carlo Smulato (Cot.) Wth ths method, we ca geerate a hstogram for f 3 to determe the fluece of compoet varato o cutoff frequecy Ths gves more formato that the above covetoal method 0 (e.g.) estmate of the most lkely value of f3 at whch the hstogram s ts peak. 0 We call ths covetoal method a etreme value aalyss
More o Mote Carlo Smulato (Cot.) Addtoal use of data from MC smulatos: (a) If the RC crcut s part of a larger system, we ca fd the mamum or mmum values of the cutoff frequecy, ad carry out a worst-case desg. (b) Fd the appromate probablty that f 3 les a specfed rego, for eample, from fgure5-5(c): P(40 whereas P(50 f f 3 3 50) 60) 900 5000 600 5000 0.8 0.3
More o Mote Carlo Smulato (Cot.) Addtoal use of data from MC smulatos: (a) If the RC crcut s part of a larger system, we ca fd the mamum or mmum values of the cutoff frequecy, ad carry out a worst-case desg. (b) Fd the appromate probablty that f 3 les a specfed rego, for eample, from fgure5-5(c): Ths s the case wth more comple fuctoal relatoshps or probabltc models for the compoet values... Note that, a smple model as the above eample, we ca easly fgure out the etreme values w/o MC smulato. Note that, for ths purpose, the emprcal cdf would be more accurate..
Statstcal Process Cotrol I a maufacturg process, there several steps that take place Motor ad determe whe thgs have goe wrog... Close dow the process ad look for the offedg steps : Cotrol Chart Illustrato Suppose we are maufacturg trasstors, the ga of whch s of our cocer measure the ga, dvde the easuremets to lots
Statstcal Process Cotrol (Cot.) compute the sample mea of each lot: m, =,,,N compute the sample mea of the sample meas 3 : m set the upper ad lower lmts as ±3 sample std of the m 's from m check f ay m falls outsde the cotrol lmt... decde whether the maufacturg process has a problem or ot 3.e. sample mea of etre samples!
Statstcal Process Cotrol (Cot.) Eample 5.9 I a trasstor maufacturg process, for 5 lots of sze 5, the sample meas of the curret gas are measured ad plotted fgure5-6 by the 's. The sample mea of the etre 5 5 = 5 trasstor gas s 98.8, ad the sample stadard devato of the lot's sample mea s 6.95. We, the, set the upper ad lower cotrol lmts as: UCL = 98:8 + 3 6:95 = 9:65 LCL = 98:8-3 6:95 = 77:96
Statstcal Process Cotrol (Cot.) Soluto: The correspodg cotrol chart,.e. the sample meas of the 5 lots alog w/ the UCL ad LCL, s fgure 5-6. Note that for ths partcular process, we are well wth the cotrol lmts. 4 The MATLAB program producg the cotrol chart s Table5-. 4 If the sample mea of ay lot eceeds or goes below the UCL or LCL, respectvely, the process s sad to be out of cotrol, ad we seek for the cause of the ecurso, ad correct t!
Statstcal Process Cotrol (Cot.) Fgure 5.6: Cotrol chart for trasstor gas: 5 lots of 5 trasstor each
Statstcal Process Cotrol (Cot.) Table5. MATLAB program for geeratg a cotrol chart.
Covergece of the Sample Mea to the Mea Recall: The sample mea of a populato, gve below, s a estmate of the true mea μ X : Questo: How good ths estmate s? (r.v.) Aswer: We get a boud by applyg the Chebyshev's equalty 5... 5 We replace X wth.
Covergece of the Sample Mea to the Mea (Cot.) To apply the Chebyshev's equalty, we eed the () mea ad () varace of the sample mea : () mea 6 : E( ) X 6 Ths s show problem 4-4a.
Covergece of the Sample Mea to the Mea (Cot.) () varace: each sample from the populato of the varace s where ) Var( ) Var( ] ) [( )] )( [( ) )( ( ) ( } )] ( {[ ) Var( X X X j X j X j X j X X X X X X E X X E X X E X X E E
Covergece of the Sample Mea to the Mea (Cot.) Thus, the Chebyshev's equalty 7 becomes: P X k X k 7 Ufortuately, t s usually true that we also do ot kow the varace X of the samples, so we caot apply ths Chebyshev's equalty real estmato problems. We wll dscuss ths subject Chapter 6 aga.
Covergece of the Sample Mea to the Mea (Cot.) Eample 5.0 Samples draw from a populato are kow to have stadard devato of. We wat the probablty that the absolute value of the dfferece b/w ther sample mea ad the true mea s greater tha 0.5 to be less tha %. How may samples should be draw 8? 8 So that the sample mea, as a estmate of the true mea, matas the precso gve the problem.
Covergece of the Sample Mea to the Mea (Cot.) Soluto: From the Chebyshev's equalty, we wat: 0.0 k ad or k 00 or k 0 kσ X 0 0.5 or 0 0.5 40 or 600 Note that ths s a farly large # of samples.
Covergece of the Sample Mea to the Mea (Cot.) Remark If we kow somethg about the dstrbuto of the sample mea, the requred umber of samples ca be reduced...
Covergece of the Sample Mea to the Mea (Cot.) Recall 9 the cetral lmt theorem whch says that the sample mea s appromately Gaussa for a large umber of samples, ad the the LHS of Chebyshev's equalty becomes: e v /( X / ) dv Q( k) 0.0 k X / X / where the substtuto u = / v/σ X has bee used the tegrad to get the Q-fucto epresso. 9 I Chapter 4.
Covergece of the Sample Mea to the Mea (Cot.) Usg the table of Q-fucto, we fd that k = :57, ad sce k must be a teger, we roud t up to 3. Therefore, we have: kσ X 3 3 0.5 or 0.5 or or 44 0.5 Notce that the # of samples s cosderably lass tha the prevous case...