Basic Probability/Statistical Theory I

Basi Probability/Statistial Theory I

Epetatio The epetatio or epeted values of a disrete radom variable X is the arithmeti mea of the radom variable s distributio. E[ X ] p( X ) all Epetatio by oditioig E[ X ] E[ X Y y] p( Y y) all y

Epetatio of the Sum of Variables Let S, X, ad Y be radom variables, ad let a, b ad be ostats. If: The: S E[ S] a X by Note: This theorem holds whether or ot X ad Y are statistially idepedet. This theorem a also apply to that of three or more radom variables. a E[ X] b E[ Y]

Epetatio of the Sum of Variables (otiued) Where X ad Y are statistially idepedet, the followig is also true: If : S a X by The : E[ S] a E[ X] b E[ Y]

Variae The variae of a radom variable is a measure of dispersio aroud the arithmeti mea of the distributio. Var [ X ] ( E[ X ]) p( X ) all Variae by oditioig Var[ X] E[ Var[ X Y]] Var[ E[ X Y]]

Variae (otiued) The variae Var[X] by oditioig is the same as epressig the variae as the sum of the withivariae ad betwee-variae. Var[ X] E[ Var[ X Y]] Var[ E[ X Y]] Var[ X ] Var[ X all y Y y] p( Y y) all y ( E[ X Y y] E[ X ]) p( Y y) withi-variae betwee-variae

Withi- ad Betwee-variae Compoets Meaigs of the withi-variae ad betwee-variae ompoets: The withi-variae ompoet represets the average variae of X withi groupigs or ategories based o aother variable Y. The betwee-variae ompoet reflets the differee betwee the average value of X i the groupigs or ategories based o Y.

Let S, X ad Y be radom variables, ad let a, b ad be ostats. If : The : Variae of the Sum of Variables Y b X a S ], [ ] [ ] [ ] [ Y X Cov b a Y Var b X Var a S Var

Eample If S a X by ad a=1 ad b=-1, the the variae of S is: S X Y Var[ S] Var[ X ] Var[ Y] Cov[ X, Y] If X ad Y are statistially idepedet, the Cov[X,Y]=0 Var[ S] Var[ X] Var[ Y]

Variae of the Sum of Three or More Radom Variables It is more ompliated ad ivolves the oept of ovariae matri. Let S be the sum of k umber of radom variables deoted by X 1, X,,X k, ad let 1,,, k be ostats. If : S The : X X... 1 1 k X k Var[ S] k k i1 j1 Cov[ i X i, j X j ]

Variae of the Sum of Three or More Radom Variables (otiued) Covariae matri X 1 X X 3 X k X 1 Cov[X 1,X 1 ] Cov[X 1,X ] Cov[X 1,X 3 ] Cov[X 1,X k ] X Cov[X,X 1 ] Cov[X,X ] Cov[X,X 3 ] Cov[X,X k ] X 3 Cov[X 3,X 1 ] Cov[X 3,X ] Cov[X 3,X 3 ] Cov[X 3,X k ]............... X k Cov[X k,x 1 ] Cov[X k,x ] Cov[X k,x 3 ] Cov[X k,x k ]

Variae of the Sum of Three or More Radom Variables (otiued) Rewrite Var[S]: Var[ S] k k i1 j1 Cov[ i X i, j X j ] k k k i Var[ X i ] i1 i1 i j Cov[ X To alulate Var[S], we eed to kow the value of eah variae term ad the value of the ovariae for idividual pairs of radom variables. i j i, X j ]

Variae of the Sum of Three or More Radom Variables (otiued) Whe all the radom variables are pairwise idepedet, the Var[S] simplifies to: Var[ S] k i1 Var[ i X i ]

Basi Probability/Statisti Theory II

Sample Mea Distributio (1) Assume there is a large populatio of N elemets, ad that we draw a simple radom sample of elemets from this populatio suh that is muh smaller tha N. Based o the elemets i the sample, we alulate a sample mea, : i 1 i

Sample Mea Distributio () The sample mea ( ) is a estimate of the populatio mea E[X] (or ). i 1 1... i ˆ Due to radom seletio, the value of a vary from sample to sample. ad ˆ

Sample Mea Distributio (3) If there are M distit samples of elemets that we a draw from the populatio, there are M of possible sample meas. Some of these sample meas may have the same value or all ould be differet. This set of M sample meas is termed the samplig distributio of the sample meas whih is also alled sample mea distributio for short.

Epetatio of the Sample Mea Distributio ( ): Sample Mea Distributio (4) ˆ] E[ E E E i i... ˆ] [ 1 1 ]) [... ] [ ] [ ( 1 ]... [ 1 1 1 E E E E

Sample Mea Distributio (5) Uder simple radom samplig, the epeted value for ay elemet draw ito the sample is equal to populatio mea,. 1 E[ ˆ] E[ i ] i1 Sie E[ ˆ], the estimator is said to be ubiased estimate of the populatio mea. ˆ

Variae of the Sample Mea Distributio ( ). Sample Mea Distributio (6) ˆ] Var[ Var Var Var i i... 1 ˆ] [ 1 1 i i j i j i i Cov Var Var 1 1 1 ], [ ] [ 1 ]... [ 1

Sample Mea Distributio (7) Uder simple radom samplig, two importat fats related to the variae are: The variae assoiated with ay elemet draw ito the sample is equal to the populatio variae,, ad The elemets i the sample are idepedet, so all elemets are pairwise idepedet suh that Cov[, ] 0, for all i ad j. Therefore, i j Var[ ˆ] 1 i1 Var[ i ] 1 i1

Sample Mea Distributio (8) If is muh smaller tha N, Var[ˆ] is ubiased estimate of, divided by the sample size. Var[ˆ] is a measure of the preisio of the estimate of. A ubiased estimate of Var[ˆ] is. S ( 1 ) N The term stadard error is ormally used to represet the stadard deviatio of the sample mea distributio. S S X S X

Bias, Preisio ad Auray (1) I statistial samplig, the term auray a be thought of as ombiig the oepts of bias ad preisio. However, statistiias ted ot to use the term auray but istead use the term mea square error (MSE), whih is defied as: mea square error variae bias MSE( ˆ) Bias( ) Var[ ˆ] Bias( ˆ) E[ ˆ ] Var[ˆ] preisio

Bias, Preisio ad Auray () The estimator with the lowest MSE is osidered the best or most aurate estimator. It is possible that a radom samplig sheme may ivolve a biased estimator suh that E[ ˆ]. However, if for Var[ the ˆ] sheme is suffiietly low, the overall MSE may be lower tha that of a differet samplig sheme for whih. E[ ˆ] It is importat to uderstad that zero bias does ot sigify that every sample mea equals the true populatio mea. Rather, zero bias sigifies that the average of all possible sample mea values equals the populatio mea.

Bias, Preisio ad Auray (3) S 1 True S a b 1 Whih oe of S1 ad S distributios has better preisio ad auray for estimatig the true distributio?

Bias, Preisio ad Auray (4) I idustrial hygiee, whe olletig ad aalyzig a sample, we usually eperiee a error i that our result does t equal the true evirometal level that we moitored. There are two soures of error: (1) bias (= systemati error) It is the differee betwee the mea of our repeated measuremets ad the true value. () radom error (= radom variability) It represets variability i the repeated measuremets of a ostat evirometal level. This variability may arise from flutuatios i the flow rate of the samplig pump, flutuatios i eletrial urret flow for the laboratory istrumets, et.

Bias, Preisio ad Auray (5) I idustrial hygiee, the radom error i a samplig ad aalytial method is ofte epressed by oeffiiet of variatio (CV). If we deoted the oeffiiet of variatio i the samplig devie by CV S, ad the oeffiiet of variatio i the aalytial proedure by CV A, the the total oeffiiet of variatio (CV T ) is omputed as the followig: CV T CV S CV A

Bias, Preisio ad Auray (5) I idustrial hygiee, auray is a somewhat ofusig statisti that iorporates: measuremet error due to both bias ad radom error; ad a ofidee level. We ofte say that we wat our samplig ad aalytial method to have 5% auray at a 95% ofidee level for measurig a evirometal oetratio at the permissible eposure limit (PEL). This meas that if we geerate a ostat test atmosphere at the PEL, at least 95% of our measuremets must fall withi the rage (0.75 PEL) to (1.5PEL).

Bias, Preisio ad Auray (6) If we are give : the bias (as a proportio) ad the oeffiiet of variatio (CV m ) of the method; ad the true test oetratio () we are tryig to measure, the we are asked to determie if the method meets the required auray at a 95% ofidee level. The way to determie it is: (1) To ompute the mea ( m ) of the measuremets. m Bias () To ompute the stadard deviatio ( m )of the measuremets. m CVm m

Bias, Preisio ad Auray (7) (3) To fid the peretiles of measuremets orrespodig to (0.75PEL) ad (1.5PEL). Z Z upper lower 1.5 PEL m 0.75 PEL m m m ad use the Z tablefor the peretilex ad use the Z tablefor the peretilex upper% lower% (4) If X upper % X lower % 95%, the method meets the auray riterio.

Eample A method to have 5% auray at a 95% ofidee level for measurig a atmosphere at the PEL. Kow the bias = 0.04, CV=0.11 ad PEL=00 ppm for this method, please hek whether the method meets the stated riterio. (1) m () m 00 0.0400 08 ppm 0.11 08.9 ppm 1.500 08 (3) Zupper 1.83 X 96.7%.9 0.7500 08 Zlower.53 X 0.6%.9 (4) 96.7%- 0.6% 96.1%of the measuremets fall withi 5%PEL. Colusio: Auray riterio was met.

Notes o Logormal Distributio

Trasformatio of Parameters Betwee Normal ad Logormal Distributios (1) Defiitio : arithmetimea : arithmetistadard deviatio g : geometrimea g : geometristadard deviatio :mea of log- trasformed values l : stadard deviatioof log- trasformed values l

Trasformatio of Parameters Betwee Normal ad Logormal Distributios () g g l e 0.5 ] 0.5 [ l l l e e g 1) ( 1) ( ) ( l l l l l e e e e g ) l(1 l e e g g

Trasformatio of Parameters Betwee Normal ad Logormal Distributios (3) l l( g ) l l 0.5 l l l( g ) l l[1 ( )]

Theorems Regardig the Produt of Logormal Variables (1) Let X ad Y be logormally distributed variables, ad let be a ostat. If P X Y The: P is a logormally distributed variable.

Proof: Theorems Regardig the Produt of Logormal Variables () By log-trasformig the epressio for P we obtai: lp l lx ly Beause is a ostat, l is a ostat. Beause X is logormally distributed, lx is ormally distributed. Beause Y is logormally distributed, ly is ormally distributed. Beause lp is the sum of two ormally distributed variables ad a ostat, lp is ormally distributed. Beause lp is ormal distributed, P is logormally distributed.

Theorems Regardig the Produt of Logormal Variables (3) Let X ad Y be logormally distributed variables, ad let be a ostat. If P X Y The: GM[ P] GM[ X] GM[ P]

Proof: Theorems Regardig the Produt of Logormal Variables (4) By log-trasformig the epressio for P we obtai: lp For the ormally distributed variables lx ad ly, the epetatio of the sum lp is: E[lP] By defiitio: E[lP] E[l] E[lX ] E[lY ] l lx ly E[l] lgm[ P] l lgm[ X ] lgm[ Y ] E[lX] E[lY ] Reall: GM[ P] E e [l P ]

Theorems Regardig the Produt of Logormal Variables (5) Proof:(otiued) Rewite the epressio: E[lP] E[l] lgm[ P] E[lX] Epoetiate both sides of the equatio: E[lY ] llgm[ X] lgm[ Y] Sie e lgm[ P] l e a a, e e llgm[ X ] lgm[ Y ] l e lgm[ X ] e lgm[ Y ] the above equatio a be writte as: GM[ P] GM[ X] GM[ Y]

Theorems Regardig the Produt of Logormal Variables (6) Let X ad Y be logormally distributed variables, ad let be a ostat. If P X Y The: GSD[ P] e (lgsd[ X ]) (lgsd[ Y ]) r (lgsd[ X ]) (lgsd[ Y ]) where r is the orrelatio oeffiiet for lx ad ly.

Proof: Theorems Regardig the Produt of Logormal Variables (7) By logtrasformaig the epressio of P we have: lp l lx ly For the ormally distributed variables lx ad ly, the variae of the sum lp is: Var[l P] Var[l X] Var[lY ] Cov [lx,ly] The term Var[l] does ot appear sie l is a ostat, ad the variae of a ostat is zero. By defiitio: Var[lP] (lgsd[ P]) SD[lP] lgsd[ P] Var[l X ] (lgsd[ X ]) SD[l X ] lgsd[ X ] Var[lY ] (lgsd[ Y ]) SD[lY ] lgsd[ Y ] Reall: GSD[ P] e SD[l P ]

Theorems Regardig the Produt of Logormal Variables (8) Proof:(otiued) The orrelatio oeffiiet r for two variables X ad Y is defied as: Therefore, r r By rearragemet ad substitutio: Rewrite Var[P] as: Cov ( X, Y ) SD[ X ] SD[ Y ] Cov (lx,ly ) SD[l X ] SD[lY ] Cov[l X,lY] r r SD[l X] SD[lY ] (lgsd[ X ]) (lgsd[ Y ]) (lgsd[ P]) (lgsd[ X]) (lgsd[ Y]) r (lgsd[ X]) (lgsd[ Y])

Theorems Regardig the Produt of Logormal Variables (9) Proof:(otiued) Take the square root of both sides of the equatio: lgsd[ P] (lgsd[ X ]) (lgsd[ Y ]) r (lgsd[ X ]) (lgsd[ Y ]) By epoetiatig both sides of the equatio, we obtai: GSD[ P] e (lgsd[ X ]) (lgsd[ Y ]) r (lgsd[ X ]) (lgsd[ Y ]) Note that if X ad Y are idepedet, whih meas that lx ad ly are idepedet, r =0, ad the above epressio for GSD[P] simplifies to: (lgsd[ X ]) (lgsd[ Y ]) GSD[ P] e