Confidence Level We want to estimate the true mean of a random variable X economically and with confidence.

Cofidece Iterval 700 Samples Sample Mea 03 Cofidece Level 095 Margi of Error 0037 We wat to estimate the true mea of a radom variable X ecoomically ad with cofidece True Mea μ from the Etire Populatio P e μ e < < + Cof Saple Mea from Samples

Approach 0: Estimate the mea of a Gaussia rv with kow variace Assumptio X ~ N, ( μ ) is kow Goodess of our assumptio Not practical at all But our Approach 0 is a ice steppig stoe toward a practical approach Procedure The solutio i Approach 0 cosists of the followig four steps: Step ) Fid the Sample Mea Get iidradom samples of X to form the sample mea ˆ X X j j, Sice the sum of idepedet Gaussia is Gaussia, EX [ ˆ ] μ VAR( ) is Gaussia with Step ) Normalize the Gaussia Sample Mea ~ N(0,) / Step 3) Fid the area of the ormalized Gaussia tail For a arbitrary > 0, fid the value of z such that P z / Alteratively u Qz e du π z

3 Step 4) Formulate the cofidece iterval z / Fially we have z z / ˆ X z μ + z ˆ ˆ PX z μ X + z is referred to as the cofidece level z is referred to as the margi of error ˆ, ˆ X z X + z is referred to as the cofidece iterval of μ For a large, we will have a small margi of error, but at a higher cost

4 Example Assume X ~ N( μ, 4) We wat to estimate μ We will get the sample mea from 6 samples of X Fid the 80% cofidece iterval Solutio 08 z z 8 0 The margi of error is z 8 064 6 The cofidece iterval is expressed as P 064 μ 064 + 08 For usig Excel, NORMSINV returs z Refer to Excel file ormal-dist kow variace Cofidece Iterval Example Note We caot expect to kow the variace whe we do t kow the mea What ca we do to alleviate the impracticality?

5 Sample Variace We still assume X ~ N( μ, ) We wat to estimate μ ad we do ot kow Defie the sample variace as ˆ ( j ) s X X wheere is the sample mea defied as before The sample variace s is a estimate of the true variace s is a fuctio of radom variables Theorem Es [ ] The sample variace is a ubiased estimate of Proof for For, ( ˆ j ) s X X substitutig X + X ad takig expectatio X+ X X+ X + s X X X X + X X X X otig X ad X are iid to X, X X

6 Proof for ay ( j ) s X {( X j μ) ( μ X) } + ˆ ˆ + ˆ ( X j μ) ( X j μ)( X μ) ( X μ) recallig { X j} s X X + ( μ) ( j μ)( μ) ( μ) ( j μ) are iid samples of X ( ) ˆ Recallig VAR X ad VAR( X ) ( μ) ( X j ) + Notig ( ˆ X ) + + X μ Review Note ˆ ( j ) j s X X NOT ˆ ( j ) j s X X

7 Approach : Estimatig μ with ukow variace Use the sample variace s i place of the variace Derive a cofidece iterval from P z s/ However is ot Gaussia s/ We must kow its distributio to fid the value of Gosset showed has the studet-t distributio with degrees of freedom s/ z Properties of the Studet-t distributio with k degrees of freedom Notatio: Y ~ t k k k + Γ + y fy ( y) + k k Γ Γ k < y < Oe ca show 04 Y 0 f ( y) f ( y) Y k VAR( Y ) k lim VAR( Y ) k t N(0,) for a large k k Y 03 0 0-4 - 4 pdf of t k for k,5,30, approachig N(0,) W S Gosset published uder the pseudoym, "A Studet" Ref Distributios various pdf ad cdfxls

8 Studet-t Distributio To show has studet-t distributio with degrees of freedom, s/ we eed the followig lemmas Write / s/ s ( ) Lemma : ~ N(0,) / s ( ) Lemma : ~ χ, chi - square with degrees of freedom Lemma 3 : s ( ) μ ad are idepedet / Lemma 4 : V W ~ t, the studet-t distributio with degrees of freedom provided that V ~ N(0,), W ~ χ, ad V ad W are idepedet

9 The st step for lemma We ca show the followig relatio: ( ) j ˆ s ( X X) j i j i j> i X XX e ( ) Proof of lemma, Chi-Square, for 3 For 3, from e, ( X X X3 ) ( XX XX 3 XX3) s ( ) + + + + 3 3 Let + 3 3 X X X X X + 3 3 U X+ X X 3 3 3 U X X 3 The we ca show ( ) U ~ N 0,, U ( ) ~ N 0,, ad U ad U are idepedet Therefore ad thus s s ( ), ( ) ~ χ ( U ) + ( U )

0 Proof of lemma, Chi-Square, for 3,4,5, For 3, s ( ) ( U) + ( U) X+ X3 U X 3 3 U X X3 For 4, s ( ) ( U) + ( U) + ( U3) 3 X + X3 + X4 U X 4 4 3 X3 + X4 U X 3 3 X3 X4 U3 For 5, s ( ) ( U) + ( U) + ( U3) + ( U4) 4 X+ X3 + X4 + X5 U X 5 5 4 3 X3 + X4 + X5 U X 4 4 3 X4 + X5 U3 X3 3 3 X4 X5 U4

Approach : Evaluatig the cofidece iterval with Sample Variace Step Assume the target radom variable X is Gaussia, N( μ, ) We wat to estimate μ, ad we do ot kow Step ˆ Get samples, ad calculate the sample mea X ad the sample variace s ˆ j ( j ) ˆ X X s X X has the studet-t distributio with degrees of freedom s/ Step 3 For a give cofidece level, determie z such that z t ( u) du Excel TINV (, deg of freedom) returs z Step 4 We kow P z s/ that is, ˆ s ˆ s P X z μ X + z ˆ s, ˆ s X z X + z is referred to as the cofidece iterval Example See t-dist example excel file

No-gaussia Radom Variables What ca we do if the target radom variable is ot gaussia? Solutio Batch Samples Suppose we wat to estimate the true mea of Y, but Y is ot Gaussia The take batches of samples Each batch cosists of m samples Y Y Y X m Y Y Y X m Y Y Y X m Yj+ Yj + + Yjm Compute the average i each batch: X j, j,,, m For a large value of m, we ca assume X is a sample of a Gaussia radom variable X Note that Yj+ Yj + + Yjm E[ X] E[ Y] m So we ca use a estimate of the mea of X as that of the mea of Y j The batch sample method cosists of the followig steps: Step ) Arrage the m samples i batches Each batch cosists of m samples Step ) Compute the average i each of the batches Call the averages as X, X,, X { X j} Treat as Gaussia samples Step 3) Formulate the sample mea ad the sample variace out of the Gaussia samples The rest is same as before The rest proceeds the same as before See t-dist Beroulli CI Example excel file

3 Guidelie to use t-dist cofidece iterval agaist Beroulli samples Margi of Error z s Icreasig the umber of samples makes the cofidece iterval tighter Icreasig also icreases the degree of freedom However, icreasig the degree of freedom beyod 0 reduces z by a isigificat amout A larger size of batches (a larger value of m) reduces the sample variace I our example of biomial samples; 006 for 4-sample batches, 003 for 0-sample batches s Also a larger batch size provides a safer groud for assumig the batch sample meas are Gaussia Also oe must ote that icreases sharply for the values of beyod 095 z