UNIVERZA NA PRIMORSKEM FAKULTETA ZA MATEMATIKO, NARAVOSLOVJE IN INFORMACIJSKE TEHNOLOGIJE

Size: px
Start display at page:

Download "UNIVERZA NA PRIMORSKEM FAKULTETA ZA MATEMATIKO, NARAVOSLOVJE IN INFORMACIJSKE TEHNOLOGIJE"

Transcription

1 UNIVERZA NA PRIMORSKEM FAKULTETA ZA MATEMATIKO, NARAVOSLOVJE IN INFORMACIJSKE TEHNOLOGIJE Zaklju a aloga (Fial project paper) Iterval zaupaja za populacijsko povpre je, ko porazdelitev preu evae spremeljivke i ormala (The eects of oormal distributio o codece iterval aroud populatio mea) Ime i priimek: Slažaa Babi tudijski program: Matematika Metor: doc. dr. Rok Blagus Koper, juij 2015

2 leto 2015 Klju a dokumetacijska iformacija II Ime i PRIIMEK: Slažaa BABI Naslov zaklju e aloge: Iterval zaupaja za populacijsko povpre je, ko porazdelitev preu evae spremeljivke i ormala Kraj: Koper Leto: 2015 tevilo listov: 39 tevilo slik: 6 tevilo tabel: 20 tevilo referec: 11 Metor: doc. dr. Rok Blagus Klju e besede: iterval zaupaja, pokritje itervala zaupaja, ²iria itervala zaupaja, bootstrappig, simulacije. Math. Subj. Class. (2010): 62F25 Izvle ek: Cilj aloge je preu iti pokritje i ²irio itervala zaupaja za populacijsko povpre je. Izpeljali bomo ceilke za iterval zaupaja za primer, ko je preu evaa spremeljivka porazdeljea ormalo ter za primer, ko spremeljivka i porazdeljea ormalo. Poleg klasi ih ceilk za oceo populacijskega povpre ja bomo uporabljali tudi ovej²e metode samovzor eja (ag. bootstrap), ki temeljijo izklju o a podlagi vzorcev. Preu evali bomo vpliv velikosti vzorca i oblike porazdelitve. Za preu evaje lastosti ceilk pri majhih vzorcih bomo uporabili simulacije, kjer as bo zaimalo, katera izmed ceilk ima pri dolo ei velikosti vzorca i obliki populacijske porazdelitve ajbolj²e pokritje. V kolikor bo imelo ve ceilk isto pokritje, as bo zaimala tudi ²iria dobljeih itervalov zaupaja.

3 leto 2015 III Key words documetatio Name ad SURNAME: Slažaa BABI Title of al project paper: The eects of oormal distributio o codece iterval aroud populatio mea Place: Koper Year: 2015 Number of pages: 39 Number of gures: 6 Number of tables: 20 Number of refereces: 11 Metor: Assist. Prof. Rok Blagus, PhD Keywords: codece iterval, coverage probability, width of the codece iterval, bootstrappig, simulatios. Math. Subj. Class. (2010): 62F25 Abstract: The aim of the al project paper is to rst derive codece iterval whe the variable that is beig studied follows ormal distributio ad i case it does ot. I order to costruct codece iterval, we will rely o Cetral Limit Theorem ad o the Slutsky Theorem. After that, we will examie the coverage probability ad the width of the codece itervals for dieret sample sizes ad dieret probability distributios, usig simulatios. We will explai what are simulatios, whe we used them, ad why they are good. Also, we will briey preset some basic properties of the distributios we have used for simulatios. We will explai what is bootstrappig ad how does it work. At the ed will be give results obtaied usig dieret methods for costructig codece iterval.

4 leto 2015 IV Ackowledgemet I would like to express my deep gratitude to my metor, assist. prof. Rok Blagus, for him guidace, useful help ad advice for my al project paper. I would also like to thak Faculty of Mathematics, Natural Scieces ad Iformatio Techologies for their support through scholarship. Fially, I wish to thak my family, especially my mother for support ad ecouragemet throughout my study.

5 leto 2015 V Cotets 1 Itroductio 1 2 Codece iterval Codece iterval aroud populatio mea Cetral Limit Theorem 13 4 Slutsky Theorem 16 5 Bootstrappig 19 6 Probability distributios The Normal Distributio The Gamma Distributio The Expoetial Distributio The Uiform Distributio The Pareto type I Distributio Simulatios Results of simulatios Coclusio 35 9 Povzetek aloge v sloveskem jeziku Bibliography ad Sources 38

6 List of Tables VI

7 leto 2015 VII List of Figures 1 The percetages of area uder the ormal curve(gure is from [7]) Probability desity fuctio of ormal distributio Probability desity fuctio of gamma distributio Probability desity fuctio of expoetial distributio Probability desity fuctio of uiform distributio Probability desity fuctio of Pareto distributio

8 leto 2015 VIII List of Abbreviatios i.e. that is e.g. for example i.i.d. idepedet ad idetically distributed CI codece iterval BCa bias corrected ad accelerated etc. ad so o

9 leto Itroductio I statistics, estimatio refers to the process by which oe makes ifereces about the populatio, based o iformatio obtaied from a sample. The sample statistic is calculated from the sample data ad the populatio parameter is estimated from this sample statistics. There are two types of estimates: poit estimates ad iterval estimates. The poit estimate is usually dieret from the populatio parameter, because of the samplig error. Because of that, it is better to give a iterval estimate, which is a rage of values used to estimate the parameter. Codece iterval is the most commoly used iterval estimate to make ifereces about the populatio parameters from the sample data. Imagie you are tryig to d out how may days of vacatio Sloveias have take i the past year. You could ask every Sloveia about his or her vacatio schedule to get the aswer, but this would be expesive ad time cosumig. To save time ad moey, you would probably survey a smaller group of Sloveias. However, your dig may be dieret from the actual value if you had surveyed the whole populatio. That is, it would be a estimate. Each time you repeat the survey, you would likely get slightly dieret results. Commoly, whe researchers preset this type of estimate, they will put a codece iterval aroud it. The codece iterval is a rage of values, i which the actual value is likely to fall. It represets the accuracy or precisio of a estimate. Here, we are iterested i the codece iterval aroud the populatio mea. First of all, we will explai why do we eed codece iterval, ad how do we iterpret it. The, we will cotiue with presetig two ways of the derivatio of the codece iterval. First, whe the populatio variace is kow ad the variable that is beig studied follows ormal distributio, ad the secod oe, whe the populatio variace is ukow ad the variable that is beig studied follows t-distributio. After that, i the third ad the fourth chapter, to explai better the derivatio of the codece iterval we will preset two importat theorems; Cetral Limit Theorem ad Slutsky Theorem. I chapter ve, we will preset a powerful statistical techique, bootstrappig. Usig it, we ca deal with samples i cases we ca ot assume a ormal distributio or the t-distributio. Moreover, our aim is to examie the coverage probability of the codece iterval ad its width by usig dieret distributios ad

10 leto simulatios. So for the ed, there will be results obtaied by usig simulatios.

11 leto Codece iterval Iterval estimators ad derivatio of the codece iterval will be the mai topic i this chapter. The fudametal idea of statistics is to aalyze a sample of data, ad to make quatitative ifereces about the populatio, from which the data were sampled. Codece itervals are the most straightforward way to do this. Example 2.1. Say we are iterested i the mea weight of 18-year-old girls livig i the Europe. Sice it would have bee impractical to weigh all the 18-year-old girls i the Europe, we take a sample of for example 10 girls with weights of 51, 55, 49, 57, 62, 47, 51, 53, 59, ad 56 kg, ad d that the mea weight is 54 kilograms. The sample mea of 54 kg is a poit estimate of the populatio mea. Example 2.2. Let us look oce agai at the example from the begiig. Say we have asked 10 Sloveias about their vacatio, ad we have got the followig results: 10, 12, 6, 15, 5, 7, 14, 8, 14 ad 9 days. The sample mea i this case is 10 days. If we just give estimate aloe, that does ot reect a measure of the samplig error of the obtaied value; we do ot have a good sese of how far this sample mea may be from the populatio mea. Because of that we eed codece itervals, sice they provide more iformatio tha the poit estimates. Deitio 2.3. A iterval estimate of a real-valued parameter θ, is ay pair of fuctios L(x 1,..., x ) ad U(x 1,..., x ), of a sample that satises L(x) U(x) for all x = (x 1,..., x ) from a sample space. If X=x is observed, the iferece L(x) θ U(x) is made. The radom iterval [L(X), U(X)] is called a iterval estimator. We will deote by [L(X), U(X)] a iterval estimator of θ ad by [L(x), U(x)] the realized value of the iterval, based o radom sample X = (X 1,..., X ). Typically, codece itervals are expressed as a two-sided rage. We call this iterval a 'two sided', because it is bouded by both lower ad upper codece limits. I some circumstaces, it ca make more sese to express the codece iterval i oly oe directio, to either the lower or upper codece limit. For example, if L(x) = the we have the oe-side iterval (, U(x)]. I other situatios, it ca make sese to express a oe-sided codece limit as a lower limit oly, so we take U(x) = ad we have [L(x), ).

12 leto Deitio 2.4. For the iterval estimator [L(X), U(X)] of a parameter θ, the coverage probability of [L(X), U(X)] is the probability that the radom iterval [L(X), U(X)] covers the parameter θ. It is deoted by P θ (θ [L(X), U(X)]). Remark 2.5. The iterval is radom quatity, ot the parameter. Remark 2.6. The probability statemets refer to X ot θ. Deitio 2.7. For a iterval estimator [L(X), U(X)] of a parameter θ, the codece coeciet of [L(X), U(X)] is the imum of the coverage probabilities. Iterval estimators, together with codece coeciet are kow as codece itervals. I geeral, whe we are ot sure i the exact form of our set, we will speak about codece set with the codece coeciet 1 α. Usually, we are lookig for 95% ad 99% codece itervals. The meaig of a codece iterval is frequetly misiterpreted. For the give data of the weight of the girls, 95% codece iterval is [51, 57]. What does that mea? If repeated samples were take ad the 95% codece iterval computed for the each sample, 95% of the itervals would cotai the populatio mea. So, codece itervals provide more iformatio tha the poit estimates. 2.1 Codece iterval aroud populatio mea This chapter is pricipally cocered with the codece iterval aroud the populatio mea. We will assume that the populatio is of size N ad that each member of populatio X 1,...,X N, is determied with a umerical value. These umerical values will be deoted by x 1,x 2,..., x N. The variable x i may be a umerical value, such as age or height. The populatio mea or average is deed as: µ = 1 N N X i. i=1 We will also eed to cosider the populatio variace, σ 2 = 1 N N i=1 (X i µ) 2. A useful idetity ca be obtaied by trasformig the expressio: σ 2 = 1 N N ( i=1 (X2 i 2x i µ + µ 2 ) N = 1 N i=1 X2 i 2µ ) N i=1 X i + Nµ 2 ( N ) i=1 X2 i 2Nµ 2 + Nµ 2 = 1 N = 1 N N i=1 X2 i µ 2 I order to calculate the codece iterval, rst of all we have to select a sample from our populatio.

13 leto Simple radom samplig is a basic type of samplig, with which every object has the same probability of beig chose. We will cosider two cases, whe it is doe with ad without replacemet. Sice we take a sample radomly, the sample mea will also be radom. Next step is to calculate a sample mea ad a stadard deviatio. As a estimator of the populatio mea we will cosider sample mea: X = 1 X i. i=1 X is a radom variable whose distributio is called samplig distributio. The samplig distributio depeds o the X i. Lemma 2.8. If we deote the values assumed by the populatio members by x 1, x 2,..., x N, ad we assume that all members of the populatio have distict values, the X i is a discrete radom variable with probability mass fuctio P (X i = x j ) = 1 N. Also holds, E(X i ) = µ V ar(x i ) = σ 2 Proof. From probability theory, we kow that probability mass fuctio of a discrete radom variable X i, i this case will be exactly equal to P (X i = x j ) = 1. N The expected value of the radom variable X i is: E(X i ) = N x j P (X i = x j ) = 1 N j=1 N x j = µ j=1 To show the last equatio we will use the deitio of variace: V ar(x i ) = E(Xi 2 ) [E(X i )] 2 = 1 N N j=1 x2 j µ 2 = σ 2 Theorem 2.9. With simple radom samplig the expected value of a sample mea, E( X) is µ. Proof.. E( X) = E ( 1 ) x i = 1 i=1 E(x i ) = 1 µ = µ i=1

14 leto I statistics, the bias of a estimator is the dierece betwee the estimator's expected value ad the true value of the parameter beig estimated. A estimator with zero bias is called ubiased estimator, i all other cases is biased. Obviously, the sample mea deed as X = 1 i=1 X i, is ubiased estimator of the populatio mea. I order to d a stadard deviatio we have to d a variace, sice a stadard deviatio is the square root of the variace. The variace of a radom variable is a measure of its variability, ad the covariace of two radom variables is a measure of their joit variability, or their degree of associatio. Deitio If X ad Y are joitly distributed radom variables, with expectatios E(X) ad E(Y ) respectively, the covariace of X ad Y is provided that the expectatios exists. Cov(X, Y ) = E [(X E(X))(Y E(Y ))] This expressio ca be simplied by expadig the product ad usig the liearity of the expectatio. Cov(X, Y ) = E [XY XE(Y ) Y E(X) + E(X)E(Y )] = E(XY ) E(X)E(Y ) E(Y )E(X) + E(X)E(Y ) = E(XY ) E(X)E(Y ) I the case whe X ad Y are idepedet, E(XY ) = E(X)E(Y ) so, Cov(X, Y ) = 0. What we ca say about Cov(X i, X j ) whe i j? First, we will look at the case whe the samplig was doe with replacemet. That meas that the populatio elemet ca be selected more tha oe time. The, X i are idepedet, ad for i j the Cov(X i, X j ) = 0, while Cov(X i, X i ) = E(X 2 i ) E(X i ) 2 = V ar(x i ) = σ 2. Usig the property of the variace for a liear combiatio of radom variables, that V ar( i=1 b ix i ) = i=1 j=1 b ib j Cov(X i, X j ), we have: ( ) V ar( X) 1 = V ar X i = 1 Cov(X 2 i, X j ) Now we ca d V ar( X). i=1 i=1 j=1 V ar( X) = 1 2 V ar(x i ) = 1 2 i=1 i=1 σ 2 = σ2 The stadard deviatio of X is σ X = σ. The other case is whe the samplig is doe without replacemet, whe a populatio

15 leto elemet ca be selected oly oe time. This causes depedecy amog X i. I order to d variace, rst we eed to d Cov(X i, X j ) for i j. First of all, Cov(X i, X j ) are the same for all i j, sice they have the same distributio. Cov(X i, N j=1 X j) = 0, because N j=1 X j is a costat. From this, we have Cov(X i, X i ) + N j=1,j i Cov(X i, X j ) = 0 Cov(X i, X i ) + (N 1)Cov(X i, X j ) = 0 This implies, Cov(X i, X j ) = σ2 for i j. We see the covariace depeds o the N 1 populatio size. If the populatio is very large, the covariace is very close to zero. Usig oce agai the property of the variace for a liear combiatio of radom variables we get: V ar( X) = V ar ( 1 i=1 X ) i = 1 2 ( i=1 j=1 Cov(X i, X j ) = 1 2 i=1 Cov(X i, X i ) + ) i=1 j i Cov(X i, X j ) = 1 2 i=1 V ar(x i) i=1 j i Cov(X i, X j ) = σ2 = σ2 = σ2 1 ( 1) σ2 ( 2 ) 1 1 N 1 ) ( N N 1 N 1 Sice the variace is deed ad calculated as the average squared deviatio from the populatio mea, ituitively the estimator of the populatio variace will be deed as the average squared deviatio from the sample mea. Ucorrected sample variace deed as ˆσ 2 = 1 i=1 (X i X) 2, is biased estimator of the populatio variace. E(ˆσ 2 ) = E [ 1 i=1 (X i X) 2] = E [ 1 i=1 (X i µ + µ X) 2] = E [ 1 i=1 ((X i µ) ( X µ)) 2] = E [ 1 i=1 ((X i µ) 2 2(X i µ)( X µ) + ( X µ) 2 ) ] = E [ 1 i=1 (X i µ) 2 2( X µ) 1 i=1 (X i µ) + ( X µ) 2] = E [ 1 i=1 (X i µ) 2 ( X µ) 2] = σ 2 E [ ( X µ) 2] < σ 2 For ubiased estimator of sample variace we suggest s 2 = 1 1 i=1 (X i X) 2. I statistics, this is kow as Bessel's correctio for the sample variace [8].

16 leto Let us show that holds E(s 2 ) = σ 2. E(s 2 ) = E ( 1 1 i=1 (X i X) 2) = 1 E ( 1 i=1 X2 i X 2) [ = 1 1 i=1 E(X2 i ) E( X 2 ) ] [ ] = 1 σ 2 + µ ( σ2 + 1 µ2 ) = 1 1 [σ2 + µ 2 σ 2 µ 2 ] = 1 1 σ2 ( 1) = σ 2 Remark V ar(x i ) = E(X 2 i ) E(X i ) 2 E(X 2 i ) = V ar(x i ) + E(X i ) 2 = σ 2 + µ 2 Remark To sum up: E( X) = V ar( X) + E( X) 2 = σ2 + µ2 i=1 (X i X) 2 = i=1 (X2 i 2X i X + X2 ) = i=1 X2 i 2 X i X i + X 2 = i=1 X2 i 2 X X + X 2 = i=1 X2 i X 2 - sample mea is deed as X = 1 i=1 X i. - If the variace of the populatio is kow, we have σ 2 = 1 N N i=1 (X i µ) 2. - If it is ot kow, we will use ubiased estimator s 2 = 1 1 i=1 (X i X) 2. First of all, we will derive the codece iterval for the populatio mea whe the stadard deviatio of the populatio is kow, ad the variable that is beig studied follows the ormal distributio. I practice, the populatio stadard deviatio is rarely kow. However, learig how to compute a codece iterval whe the stadard deviatio is kow, is a excellet itroductio to how to compute a codece iterval whe the stadard deviatio has to be estimated. To obtai this codece iterval, we eed to kow the samplig distributio of the estimator. Oce we kow the distributio, we ca talk about the codece iterval. We said before that our assumptios will be that the variable is ormally distributed. The ormal distributio is easy to use, sice it does ot brig with it too much complexity. Theorem [9] If X ad Y are idepedet radom variables that are ormally distributed, the their sum is also ormally distributed, i.e. if X N(µ X, σ 2 X ) ad Y N(µ Y, σ 2 Y ) the X + Y N(µ X + µ Y, σ 2 X + σ2 Y ).

17 leto As the cosequece of this theorem, we obtai the followig. If X 1, X 2,...,X are idepedet, ormally distributed radom variables, with mea µ ad the stadard deviatio σ, the X = 1 i=1 X i is ormally distributed with the sample mea µ ad the stadard deviatio σ. If the origial populatio is ormally distributed, we will use the above theorem. I the case the radom variables do ot have ormal distributio, we will use the Cetral Limit Theorem. More about the Cetral Limit Theorem will be i the ext chapter. To sum up, we have X i N(µ, σ), X N(µ, σ ). But how is distributed X µ σ E ( ) X µ V ar σ ( ) X µ σ = 1 σ E( X µ) = 1 σ (µ µ) = 0 = 1 V ar( X µ) = 1 σ 2 σ 2 σ 2 = 1 The special case for which µ = 0 ad σ = 1 is called stadard ormal distributio, which is deoted as Z = X µ σ. If we are iterested i the probability that a stadard ormal variable Z will fall betwee two values, for example -z i z, we ca deote that as P ( z < Z < z). We will deote by z the value from the stadard ormal distributio, for the selected codece level (e.g. for a 95% codece level z=1.96). So, whe we are lookig for a (1 α) codece iterval we will have: P ( z α/2 Z z α/2 ) = 1 α? P ( z α/2 X µ σ z α/2 ) = 1 α P ( z α/2 σ X µ z α/2 σ ) = 1 α P ( X z α/2 σ µ X + z α/2 σ ) = 1 α P ( X z α/2 σ µ X + z α/2 σ ) = 1 α From the last expressio, we obtai the formula for codece iterval for the populatio mea. If the stadard deviatio is kow, it will be X ± z α/2 σ. The lower limit is obviously X z α/2 σ, ad the upper limit is X + z α/2 σ. I practice, we ofte do ot kow the value of the populatio stadard deviatio. I that case, we should use the t-distributio, rather tha the ormal distributio. First we have to do, is to estimate stadard deviatio from the sample data. Sice we

18 leto use s as a estimator of the populatio variace, ituitively to estimate σ, we will use s X = s. First, let us look how is distributed X µ s We already kow that X µ σ of s2 ( 1) σ 2. This implies the followig: X µ s =. X µ σ s 2 σ N(0, 1). Next we will do, is to look at the distributio s 2 ( 1) = 1 1 i=1 (X i X) 2 ( 1) = i=1 (X i X) 2 s 2 ( 1) σ 2 = = i=1 (X i µ + µ X) 2 = i=1 (X i µ) 2 + i=1 ( X µ) 2 i=1 (X i X) 2 = i=1 = = i=1 ( ( Xi µ σ σ 2 ( Xi µ i=1 σ ( Xi µ σ ) 2 ( ) ) 2 X µ σ ) 2 ( ) 2 X µ σ ) 2 ( ) 2 X µ σ Observe, Z i = X i µ N(0, 1) ad Z = X µ σ N(0, 1). Before we cotiue, a few σ remarks regardig to relatioship betwee the ormal distributio ad χ 2 -distributio. If Z N(0, 1), the Z 2 χ 2 1. If X 1,..., X are idepedet, stadard ormal radom variables, with mea 0 ad variace 1, the the sum of their squares has the χ 2 -distributio with degrees of freedom. X X 2 χ 2 If X 1 χ 2 ad X 2 χ 2 m, ad they are idepedet, the X 1 + X 2 χ 2 +m Usig the above deitios, we have the followig: ( ) 2 Xi µ χ 2 σ i=1 ad ( ) 2 X µ σ χ 2 1. Ad ally, s 2 ( 1) σ 2 = i=1 ( Xi X ) 2 σ 2 = ( ) ( 2 Xi µ X µ σ i=1 σ ) 2 χ 2 1.

19 leto At the very begiig we said that we will use t-distributio. Studet's t-distributio with ν degrees of freedom ca be deed as the distributio of the radom variable T with where T = - Z has a stadard ormal distributio; Z V/ν - V has a χ 2 -distributio with ν degrees of freedom; - Z ad V are idepedet. We are iterested i distributio of of that expressio ito the followig: - Z = X µ σ N(0, 1) X µ σ s 2 1 σ 2 1 X µ s. The rst thig we did, was the trasformatio. If we look agai carefully i our case, we will observe - V = s2 ( 1) σ 2 χ 2 1 with -1 degrees of freedom. The oly thig we still have to check is the idepedece of Z ad V. If we look at the ormally distributed variables X 1,..., X, vector (X 1,..., X ) is joitly ormally distributed, i.e. a 1 X a X has a 1-dimesioal ormal distributio. Theorem it is so distributed that every liear combiatio [10] If X 1,..., X are joitly ormally distributed, ucorrelated ad Cov(X i, X j ) = 0 for all i j, the the X i are idepedet. X 1 Usig the above theorem, we have to check that cov X, X 0. =.. X X 0 But it is eough to check that Cov( X, X i X) = 0. ( ) Cov( X, X i X) = Cov( X, X i ) Cov( X, X) 1 = Cov X i, X i V ar( X) = 1 Cov( i=1 i=1 [ X i, X i ) σ2 = 1 Cov(X i, X i ) + ] Cov(X j, X i ) j i = 1 V ar(x i) σ2 = 1 σ2 σ2 = 0 σ2

20 leto It follows that Z V 1 Let us deote T = X µ s t 1.. The: P ( t α/2 T t α/2 ) = 1 α P ( t α/2 X µ s t α/2 ) = 1 α P ( X s t α/2 µ X s + t α/2 ) = 1 α The, the formula for a codece iterval for µ whe σ is ukow will be: s X ± t α/2. The values of t are larger tha the values of z, so codece itervals whe σ is estimated are geerally wider tha codece itervals whe σ is kow. Costructig codece itervals with the t-distributio is the same as usig the ormal distributio, except it replaces the z-score with a t-score. Recall the above formula for calculatig the codece iterval for a mea. Notice agai, i our calculatios we used the sample stadard deviatio s, istead of the true populatio stadard deviatio σ. This estimatio of σ itroduces extra error, ad this extra error ca be pretty big whe sample size is ot eough large. Because s is a poor estimator of σ with a small sample size, we will ot assume that the sample distributio is ormal. Istead, we will use the t-distributio, which is desiged to give us a better iterval estimate of the mea whe we have a small sample size. For the ed of this part just a few remarks regardig the formula of the codece iterval. We said 1 α is a codece coeciet. So, α is the value we choose at the begiig, ad the most commoly used codece levels are 95%, 99% or sometimes 90%. To d the critical value, or z α/2 we use tables for a stadard ormal distributio, where the values of the cumulative distributio fuctio of the ormal distributio are give. Or, whe we are usig t-distributio, the critical value t α/2,df is obtaied from tables for a t-distributio.

21 leto Cetral Limit Theorem Up to this poit, we started from the assumptio that the variable follows the ormal distributio. I this chapter we will discuss oe of the fudametal theorems of probability - the Cetral Limit Theorem, sice CLT eables us to use the approximate formula for the CI based o stadard ormal distributio, eve whe the variable that is beig studied does ot follow the ormal distributio. Theorem 3.1. Let X 1,..., X be a sequece of idepedet radom variables havig mea µ ad variace σ 2. Let each X i have the distributio fuctio P (X i x) = F (x) ad the momet geeratig fuctio M(t) = E(e tx i ). Let S = i=1 X i. The for < x <. ( lim P S µ σ ) x = Φ(x) Before we give a proof, we eed a few facts about momet geeratig fuctios. Recall, the momet geeratig fuctio of a radom variable X is M X (t) = E(e tx ). Oe of the properties of the momet geeratig fuctios is, if the momet geeratig fuctio exists i a ope iterval cotaiig zero, the M (r) (0) = E(X r ). Propositio 3.2. If X ad Y are idepedet radom variables with mgfs M X ad M Y, the M X+Y (t) = M X (t)m Y (t). Sice the proof is quite simple, eve though it is't the mai topic we will give it. Proof. M X+Y (t) = E(e tx+ty ) = E(e tx )E(e ty ) = M X (t)m Y (t) Propositio 3.3. If X is radom variable with mgf M X, ad Y = a + bx, the M Y (t) = e at M X (bt). Proof. M Y (t) = E(e ty ) = E(e at+btx ) = E(e at e btx ) = e at E(e btx ) = e at M X (bt)

22 leto To prove the CLT, we will also eed the followig theorem, ad we will skip the proof of it. Theorem 3.4. Let F be a sequece of a cumulative distributio fuctios, or just distributio fuctios, with the correspod momet geeratig fuctios M. Let F be a distributio fuctio with the momet geeratig fuctio M. If M (t) M(t), for all t i a ope iterval cotaiig zero, the F (x) F (x) for all x at which F is cotiuous. So ow we ca give the proof of the Cetral Limit Theorem. Proof. It suces to do the proof i the case µ = 0. I the case µ 0, let Y i = X i µ for each i. Let T = Y Y. The we have ( ) lim P S µ σ x = lim P Obviously, it is eough to prove the theorem for µ = 0. ( ) T σ x. Let us deote Z = S σ. Usig the above theorem, we see it is eough to show that the mgf of a stadardized sum of idepedet, idetically distributed radom variables approaches the mgf of a stadard ormal radom variable as. So, we will show that the mgf of Z teds to the mgf of the stadard ormal distributio. Sice S is a sum of idepedet radom variables, usig the rst propositio, [ ( we have M S (t) = [M(t)] t., ad by secod propositio, we have M Z (t) = M σ [ )] ( t We will look at the limit of log [M Z (t)]. First, log [M Z (t)] = log M σ )] = [ ( )] log M. We will deote 1 by x. The we have, t σ [ log M( tx L = lim )] σ. x 0 x 2 Sice M(0) = 1, to calculate the limit we will use l'hospital's rule. L = lim x 0 M ( tx σ ) t σ M( tx σ ) 2x = t 2σ lim M ( tx) σ x 0 xm( tx) = t2 2σ lim M ( tx) σ σ 2 x 0 M( tx) + xm σ ( tx) t σ σ = t2 2σ 2 M (0) M(0) + 0M (0) t σ = t2 M (0) 2σ 2 M(0) Usig property that M (r) (0) = E(X r ), we get M(0) = E(1) = 1, M (0) = E(X) ad M (0) = E(X 2 ) = V ar(x) + E(X) 2 = σ 2. So, we have L = t2, which is exactly the 2 logarithm of the momet geeratig fuctio of the stadard ormal distributio [5].

23 leto But how the Cetral Limit Theorem is coected with a dig codece iterval for the populatio mea? The cetral limit theorem states, if you have a populatio with mea µ ad stadard deviatio σ, ad take sucietly large radom sample from the populatio, the the distributio of the sample mea will be approximately ormal. This will be true regardless of whether the distributio i the populatio is ormal or ot, provided that the sample size is sucietly large. But what do we do if we wat to calculate codece iterval for a sample of isucietly large size? We use the t-distributio, but oly if we feel it is appropriate to assume that the populatio distributio itself is ormal, or close to ormal. By CLT, X µ σ is distributed ormally, eve if the populatio distributio is ot ormal. But i practice σ is rarely kow. Because of that, we look at the X µ s, ad we are iterested i the distributio of it i the case whe the populatio distributio is ot ormal. I that case we rely o the results of both the CLT, ad Slutsky theorem. More about Slutsky theorem will be i the ext chapter.

24 leto Slutsky Theorem Lookig for a codece iterval whe we did ot kow the value of the populatio stadard deviatio, what we have doe rst was to d the distributio of X µ s. Usig Slutsky's theorem we will show that X µ s d Z, where Z is radom variable with stadard ormal distributio. So, rst of all Slutsky's theorem. Theorem 4.1. [6] Let X 1, X 2,... ad Y 1, Y 2,... be radom variables. Suppose that X coverges i distributio to radom variable X, i.e. (X d X), ad Y coverges p i probability to a costat c, i.e. (Y c), the: X + Y Y X d X + c d cx X Y d X c if c 0 Remark 4.2. A sequece X 1, X 2,... of radom variables is said to coverge i distributio to a radom variable X, if lim F (x) = F (x) for every x R at which F is cotiuous. Here, F ad F are distributios fuctios of radom variables X ad X respectively. A sequece X 1, X 2,... of radom variables is said to coverge i probability towards the radom variable X, if for all ɛ > 0 lim P ( X X ɛ) = 0. If X d X ad P (X = c) = 1, where c is costat, the X p c. Covergece i probability implies the covergece i distributio, so we also have a deitio of covergece i probability towards a costat. As we said before, we will show that X µ s d Z, where Z is radom variable with stadard ormal distributio. Before we do that, we eed oe theorem. Theorem 4.3. If X 1,..., X are i.i.d. with E(X i ) = µ, the weak law of large umbers states that the sample average, X = 1 (X X ), coverges i probability towards the expected value, whe : X p µ.

25 leto Let X 1,..., X be i.i.d. with mea µ ad variace σ 2. The CLT tell us that Z = X µ σ is approximately N(0, 1). But we rarely kow σ. We have see before that we ca estimate it by s 2 = 1 ( 1 i=1 Xi X )2. We will show that if we replace σ by s, the for T = X µ s will still hold that approximately is N(0, 1). Deote S 2 = 1 i=1 ( Xi X )2. First, we will show that S 2 p σ 2, where σ 2 = 1 N N i=1 (X i µ) 2. 1 i=1 ( Xi X ) 2 = 1 Xi 2 i=1 ( 1 ) 2 X i Let us dee Y i = X 2 i. The, by the weak law of large umbers we have: i=1 1 i=1 X 2 i = 1 p Y i E(Yi ) = E(Xi 2 ) = V ar(x i ) + E(X i ) 2 = σ 2 + µ 2 i=1 Agai, by the same law we have 1 i=1 X i p E(X i ) = µ. Sice f(x) = x 2 is cotiuous, we will have ( 1 i=1 X i) 2 p µ 2, because cotiuous fuctios are limit-preservig. So, S 2 p (σ 2 + µ 2 ) µ 2 = σ 2. But we wat to see what will happe if we replace σ by s. s 2 = 1 1 i=1 ( Xi X ) 2 = 1 S2 Sice, S 2 p σ 2 ad 1, we will have 1 s2 p σ 2. Oce agai, we ca use that cotiuous fuctios are limit-preservig, so s p σ. S p The we have, 1 ad usig that cotiuous fuctios are limit-preservig we σ obtai σ p 1. s Fially, T = X µ s = ( X µ) s σ σ = ( X µ) Let us deote Z = ( X µ) σ ad V = σ s. By Slutsky's theorem, T = Z V d Z 1 = Z, sice Z d Z ad V p 1. If we do ot kow the distributio of the data we are workig with, or do ot feel comfortable makig assumptios of ormality, we rely o CLT ad the Slutsky theorem, sice we ca ot use the t-distributio without assumptio of ormality. Therefore, by the Cetral Limit ad the Slutsky theorem we ca use the asymptotic properties of the statistic T = X µ s, to form codece itervals based o the stadard ormal distributio, without makig ay assumptios about the distributio of the sample data, ad usig s 2 to estimate σ 2. A importat ote to remember, it is ofte the σ σ s

26 leto case that people say as becomes large the ormal distributio approximates the t- distributio, but i fact, as show above, as becomes large the formulatio above (T) actually approximates the ormal distributio (agai based o the CLT ad the Slutsky theorems).

27 leto Bootstrappig Bootstrappig is a computer-based method which ca aswer questios that are complicated for traditioal statistical aalysis. Most commoly, bootstrap is used to estimate the variace of the estimators that ca ot be evaluated theoretically. It is a powerful statistical techique, which works quite well, eve with samples of a small size ad whe we do ot kow aythig about the distributio of our data. Up to ow, whe we wated to determie the codece iterval, we had to assume the distributio of the populatio, ad i some cases we also had to kow the stadard deviatio. But bootstrappig method does ot require aythig other tha the sample, ad assumes that each sample is idetically ad idepedetly distributed. Basic idea of bootstrappig is that iferece about populatio from the sample data, ca be modeled by resamplig the sample data ad preformig iferece o resample. Bootstrap samples are obtaied by radomly samplig with replacemet, to obtai samples with the same size as the origial sample. So, sample from the populatio becomes 'populatio' ad resample is a 'sample'. As the populatio is ukow, the quality of the iferece from the sample is also ukow, we ca ot be sure about the samplig error. But usig a bootstrap method, the populatio is i fact the sample ad that is kow, so the quality of the iferece from the resample data is measurable. With the followig umerical example we will demostrate how the process works. Example 5.1. Assume that our sample is 1,2,3,3,10. Our goal will be a 90% codece iterval about the mea of the sample. We begi with a sample from a populatio that we kow othig about. Next we do is to form bootstrap samples. Each bootstrap sample will have the same size as a origial sample. I our case, that is ve. Bootstrap samples may be dieret from the origial sample ad from each other, sice we are radomly selectig ad replacig each value. We will take 20 bootstrap samples: 2,1,10,3,2; 3,10,10,2,3; 1,3,1,3,3; 3,1,1,3,10; 3,3,1,3,2; 3,10,10,10,3; 2,3,3,2,1; 2,3,1,10,3; 1,10,2,10,10; 3,3,3,3,3; 3,3,3,3,1; 1,2,3,3,2; 3,3,10,10,2; 3,2,1,3,3; 3,1,10,1,10; 3,2,3,1,1; 3,3,3,2,3; 10,3,1,3,3; 3,2,1,10,2; 10,2,2,1,1. Now we calculate the meas of each of our bootstrap samples. These meas, arraged

28 leto i ascedig order are: 2, 2.2, 2.2, 2.2, 2.4, 2.4, 2.6, 2.8, 3, 3.2, 3.6, 3.6, 3.6, 3.8, 4, 11, 5.6, 5.6, 6.6, 7.2. From this bootstrap sample meas, we ca obtai a codece iterval. For our example above we have a codece iterval [2.2, 6.6]. The CI is tha obtaied by calculatig the 5th ad the 95th percetile of the obtaied distributio. Next we will do, is to preset coditios that must be satised i order to bootstrappig procedure gives a reliable results. Suppose we have a radom sample X 1,..., X with values x 1,..., x. Its empirical distributio fuctio is deed as ˆF (x) = #{x j x}. Remark 5.2. #A meas the umber of times the evet A occurs. Usig this empirical distributio fuctio we wat to estimate some properties of some quatity, say T. So we wat to estimate a distributio fuctio G F, (t) = P (T t). Here, the term {F, } idicates that we take a sample of size from the F. The bootstrap estimate of the last expressio will be G ˆF, (t) = P (T t), ad similarly, we take a sample of size from the ˆF. Sice we have a sample, ext step is to take resamples from it. Say we take B resamples. It is importat to emphasize that, we are ot usig resamples to obtai some iformatio about the populatio. We are usig them to lear somethig about the distributio of the sample statistic. So, we try to approximate the samplig distributio of some statistic by resamplig the sample, ad calculatig the statistic o the resamples. I the ed, the distributio of the wated parameter T is approximated through the empirical distributio of the B estimates for T, sice we have take B resamples. I order to obtai reliable results, or i other words, i order to G ˆF, approaches G F, as, three coditios must hold. Suppose that N is the eighbourhood for F, i a suitable space of distributios. If we wat to ˆF walls ito N with probability 1, the the followig coditios must hold: [2] 1. For ay A N, G A, must coverge weakly to a limit G A,. 2. This covergece must be uiform o N. 3. The fuctio mappig A to G A, must be cotiuous. The rst coditio tells us that there is a limit for G F,. As icreases, ˆF chages, ad the secod ad third coditios are eeded to esure that G ˆF, approaches G F, alog every possibly sequeces ˆF s.

29 leto Remark 5.3. Weak covergece of G A, to G A, meas that for all itegrable fuctios h, as. h(u)dg A, (u) h(u)dg A, (u) Uder this coditios the bootstrap is reliable, meaig that for ay t ad ɛ, P (G ˆF, (t) G F, (t) > ɛ) 0 as. There are several methods for costructig codece itervals from the bootstrap distributio of a real parameter: basic, percetile, studetized, bias-corrected ad BCa method. Later o, for simulatios we will use percetile ad BCa method, because of that we will briey explai how do they work. Percetile method uses B statistics, computed from the bootstrap samples. We arraged them i a ascedig order ad if we are lookig for the 100(1 α) codece iterval we take 50α ad α percetiles as limits of the iterval. Importat issues for the bootstrap, ad iferece i geeral, are skewess ad bias sice bias estimates ca have high variability. The computatio of the BCa codece iterval is a bit more complicated. It proceeds i three steps. First, we take a B resamples. Next we have to do is to calculate a bias correctio value. The bias correctio coeciet adjusts for the skewess i the bootstrap samplig distributio. If the bootstrap samplig distributio is perfectly symmetric, the the bias correctio will be zero. At the ed we calculate acceleratio value. The acceleratio coeciet adjusts for ocostat variaces, withi the resampled data sets [4]. The formulas for calculatig this parameters are quite complicated, ad ot so ituitive. More about that is give i Efro ad Tibshirai [3].

30 leto Probability distributios I this sectio, we will preset some basic ad most commoly used probability distributios that we will use later for simulatios. A fuctio describig the possible values of a radom variable, ad their associated probabilities is kow as a probability distributio. We kow that radom variables ca be discrete, that is, takig ay of a specied ite or coutable list of values, with a probability mass fuctio, or cotiuous, takig ay umerical value i a iterval or collectio of itervals, via a probability desity fuctio. 6.1 The Normal Distributio The most familiar cotiuous distributio is the ormal distributio. The ormal distributio has two parameters, usually deoted by µ ad σ 2, which are its mea ad variace. The probability desity fuctio of the ormal distributio is give by: for < x <. f(x) = 1 σ /2σ2 e (x µ)2 2π Oe of the reasos why the ormal distributio is oe of the most importat is Cetral Limit Theorem, which shows that ormal distributio ca be used to approximate a large variety of distributios i large samples. Some of the properties of the ormal distributio are: - the mea, media(the middle umber i a set of data whe it is raked from lowest to highest) ad mode(the umber that occurs most frequetly i a data set) are equal. - it is symmetrical. This meas that if the distributio is cut i half, each side would be the mirror of the other - the total area uder the curve is equal to oe. The total area, however, is ot show. This is because the tails exted to iity. - the area uder the curve ca be determied. If the stadard deviatio is kow, oe ca determie the percetage of data uder sectios of the curve.

31 leto aroud 68% of the area of a ormal distributio is withi oe stadard deviatio of the mea. - Approximately 95% of the area of a ormal distributio is withi two stadard deviatios of the mea. The gure below shows how the percetages of area uder the ormal curve are distributed i terms of stadard deviatio uits from the mea. Figure 1: The percetages of area uder the ormal curve(gure is from [7]) The ormal distributio is very importat distributio, sice it is based o theory, rather tha o real data. May thigs i life ever match this model perfectly, but approximately they have the ormal distributio. Sometimes, we say that ormal distributio is actually a family of ormal distributios, sice each of them is characterized by its mea ad a stadard deviatio. The -th momet of the probability distributio of the variable X, if exists, is deed as µ = E(X ). The zeroth momet is the total probability, the rst momet is the mea, the secod momet is the variace, ad because they are used so frequetly we will give them for each distributio idividually. As we said, the ormal distributio is actually determied by them. For example, below are show ormal distributios with µ = 0, σ = 1 (solid curve), µ = 1, σ = 1 (dotted curve) ad µ = 0, σ = 2 (dashed 2 curve).

32 leto Figure 2: Probability desity fuctio of ormal distributio 6.2 The Gamma Distributio The gamma distributio is aother widely used distributio. A cotiuous radom variable X is said to have a gamma distributio with parameters α > 0 ad λ > 0, show as X Gamma(α, λ), if its probability desity fuctio is give by f(x) = λα x α 1 e xλ Γ(α) where x > 0, ad Γ(α) is a gamma fuctio. Some properties of the gamma probability desity fuctio are: - if 0 < α < 1, f is decreasig with f(x) as x 0. - if α = 1, f is decreasig with f(0) = 1. - if α > 1, f icreases ad the decreases, with mode at (α 1)λ. - if 0 < α 1, f is cocave upward. - if 1 < α 2, f is cocave dowward ad the upward, with iectio poit at λ(α 1 + α 1). - if α > 2, f is cocave upward, the dowward, the upward agai, with iectio poits at λ(α 1 ± α 1). - E[X] = λα - var(x) = λ 2 α.

33 leto Figure 3: Probability desity fuctio of gamma distributio Above are show probability desity fuctio of the gamma distributio for dieret values of α, for α = 1 ad λ = 1/2 (solid curve), for α = 2 ad λ = 1/2 (dotted curve) ad for α = 3 ad λ = 1/2 (dashed curve). I case whe α = 1 we have a expoetial distributio. 6.3 The Expoetial Distributio The probability desity fuctio of a expoetial distributio is f(x) = { λe xλ x 0 0 x < 0 (6.1) Some properties of it are: - f is decreasig o [0, ). - f is cocave upward o [0, ). - f(x) 0 as x. - E[X] = 1 λ - var(x) = 1. λ 2 Below are show probability desity fuctios of the expoetial distributio for differet values of λ. For λ = 1/2 (solid curve), for λ = 1 (dotted curve) ad for λ = 2 (dashed curve).

34 leto Figure 4: Probability desity fuctio of expoetial distributio 6.4 The Uiform Distributio The uiform distributio is the simplest cotiuous radom variable you ca imagie. A cotiuous radom variable X is said to have a uiform distributio over the iterval [a, b], show as X Uiform(a, b), if its probability desity fuctio is give by: f(x) = { 1 b a a < x < b 0 x < a or x > b (6.2) Expected value of a radom variable X that is uiformly distributed is E[X] = 1 2 (a+b), ad the variace is var(x) = 1 12 (b a)2. Below is show a probability desity fuctio of the uiform distributio over the iterval [1, 3]. Figure 5: Probability desity fuctio of uiform distributio

35 leto The Pareto type I Distributio The Pareto Distributio was rst proposed as a model for the distributio of icomes. The probability desity fuctio is f(x) = aba, for a > 0 ad b x <. x a+1 parameter b is a lower boud o the possible values that a Pareto distributed radom variable ca take o. A well kow properties of it are: - E[X] = b a a 1 if a > 1. - V ar(x) = b 2 a (a 1) 2 (a 2) if a > 2. The Figure 6: Probability desity fuctio of Pareto distributio Above are show probability desity fuctios of the Pareto distributio for dieret values of a.

36 leto Simulatios Up to ow, for the costructio of the codece iterval aroud populatio mea, we have relied o the CLT or the Slutsky's theorem i case. We would also like to kow what happes i case we have small. Also, we are iterested i the iuece of the distributio. What happes if the distributio is symmetric, ad what if it is asymmetric. Usig simulatios, we provide aswers to these questios. Simulatio is a modelig of a radom evets, by usig radom umbers to specify radom evet outcomes, i order to closely match real-world outcomes. It is a umerical techique for performig experimets o the computer. Properties of the statistical simulatios must be determied i a such way that method gives a reliable results, but exact derivatios of properties are rarely possible. There are may reasos for usig simulatios i statistics. For example, some situatios are dicult to aalyze, time-cosumig ad very ofte expesive. Usig simulatios, we approximate real-world results, ad at the same time we save our time, moey ad we eed less eort. Simulatio is useful oly if simulated outcomes closely match real-world outcomes. If we wat to produce a useful simulatio, rst of all we describe what are possible outcomes. Next we do, is to assig to each outcome oe or more radom umbers. Also we have to choose a source of radom umbers. For example, that ca be radom umber geerator, sice i case we have sample of large size, this is ot time cosumig. The we choose a radom umber, ad based o it we have simulated outcome. We select the umbers ad states the simulated results util we get a stable patter. I the ed, we just have to aalyze the simulated outcomes. Here we will use simulatios to check the coverage probability of the codece iterval we have foud ad its width. We will check what happes if the populatio has ormal, expoetial, gamma, uiform ad Pareto distributio. Also, we will chage sample sizes ad observe how that aects o the coverage ad o the width. We will observe for = 10, 25, 50 ad 100. Narrow width ad high codece level are desirable, ad because of that we are lookig for how large we ca get coverage close to We assumed that true mea i the populatio is 10, ad code for simulatios is made i such way, to retur also expected value of the sample mea. Furthermore, code also provides the stadard error of the sample mea. But the most importat

37 leto results are the average width of the codece itervals, ad the coverage. We will calculate all of that, usig four methods, amely: derivatio of the codece iterval from ormal distributio, from t-distributio, usig percetile method ad usig BCa method. Code was writte i R programmig laguage [11]. 7.1 Results of simulatios I case the populatio is distributed ormally ad we have a sample of size = 10 the expected value of the sample mea was ad stadard error was Regardig the average width ad the coverage probability we obtai the followig: Normal distributio t-distributio BS Percetile BS BCa average width coverage probability Table 1: Normal distributio ad = 10 For = 25 the expected value is ad the stadard error is Normal distributio t-distributio BS Percetile BS BCa average width coverage probability Table 2: Normal distributio ad = 25 I case for = 50 the expected value was ad the stadard error was Normal distributio t-distributio BS Percetile BS BCa average width coverage probability Table 3: Normal distributio ad = 50 For = 100 the expected value was ad the stadard error was Normal distributio t-distributio BS Percetile BS BCa average width coverage probability Table 4: Normal distributio ad = 100 Regardless of the sample size, the estimate of the populatio mea was ubiased ad

38 leto as expected, the stadard error decreased whe the sample size was larger. If we look at the above tables, we see the coverage probability is the best i case whe we used the t-distributio for costructig the codece iterval, sice for ay size of a sample, the coverage probability was almost We obtaied the arrowest codece iterval i case we used the bootstrappig, but coverage probability for = 10 was too much liberal, aroud Later, for = 50 ad more, the coverage was almost So, the bootstrappig works i case of a large sample sizes. But, as became bigger the results were more or less the same i all cases. We ca see that i table for = 100. The we have observed what happes if the distributio is expoetial. We will give results for sample sizes as i case for ormal distributio. I case for = 10 the expected value was ad the stadard error was Normal distributio t-distributio BS Percetile BS BCa average width coverage probability Table 5: Expoetial distributio ad = 10 For = 25 the expected value was ad the stadard error was Normal distributio t-distributio BS Percetile BS BCa average width coverage probability Table 6: Expoetial distributio ad = 25 For = 50 the expected value was ad the stadard error was Normal distributio t-distributio BS Percetile BS BCa average width coverage probability Table 7: Expoetial distributio ad = 50 For = 100 the expected value was ad the stadard error was Normal distributio t-distributio BS Percetile BS BCa average width coverage probability Table 8: Expoetial distributio ad = 100

39 leto Agai, regardless of the sample size, the estimate of the populatio mea was ubiased ad the stadard error decreased whe the sample size was larger. Whe the sample size was small ( = 10), the coverage probabilities were too small, especially whe usig a ormal distributio ad BS percetile method, where the coverage probability was oly at aroud The best, although still much too liberal codece iterval, was obtaied whe usig the t-distributio, where the average width of the codece iterval was the largest. Whe the sample size icreased, the coverage probabilities of all methods improved substatially, ad were very close to 0.95 whe there were 100 samples. Also we observed what happes if we have gamma distributio. For = 10 we obtaied the expected value equal to ad stadard error was Normal distributio t-distributio BS Percetile BS BCa average width coverage probability Table 9: Gamma distributio ad = 10 For = 25 the expected value was ad the stadard error was Normal distributio t-distributio BS Percetile BS BCa average width coverage probability Table 10: Gamma distributio ad = 25 For = 50 the expected value was ad the stadard error was Normal distributio t-distributio BS Percetile BS BCa average width coverage probability Table 11: Gamma distributio ad = 50 For = 100 the expected value was ad the stadard error was Normal distributio t-distributio BS Percetile BS BCa average width coverage probability Table 12: Gamma distributio ad = 100

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6) STAT 350 Hadout 9 Samplig Distributio, Cetral Limit Theorem (6.6) A radom sample is a sequece of radom variables X, X 2,, X that are idepedet ad idetically distributed. o This property is ofte abbreviated

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Chapter 18 Summary Sampling Distribution Models

Chapter 18 Summary Sampling Distribution Models Uit 5 Itroductio to Iferece Chapter 18 Summary Samplig Distributio Models What have we leared? Sample proportios ad meas will vary from sample to sample that s samplig error (samplig variability). Samplig

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

EE 4TM4: Digital Communications II Probability Theory

EE 4TM4: Digital Communications II Probability Theory 1 EE 4TM4: Digital Commuicatios II Probability Theory I. RANDOM VARIABLES A radom variable is a real-valued fuctio defied o the sample space. Example: Suppose that our experimet cosists of tossig two fair

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Radom Samplig The basic idea of the statistical iferece is that we are allowed to draw ifereces or coclusios about a populatio based

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

PRACTICE PROBLEMS FOR THE FINAL

PRACTICE PROBLEMS FOR THE FINAL PRACTICE PROBLEMS FOR THE FINAL Math 36Q Fall 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2 to

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Statistical Properties of OLS estimators

Statistical Properties of OLS estimators 1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

This section is optional.

This section is optional. 4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore

More information

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes. Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Binomial Distribution

Binomial Distribution 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker CHAPTER 9. POINT ESTIMATION 9. Covergece i Probability. The bases of poit estimatio have already bee laid out i previous chapters. I chapter 5

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Lecture 2: Concentration Bounds

Lecture 2: Concentration Bounds CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy

More information

Quick Review of Probability

Quick Review of Probability Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter & Teachig Material.

More information

STAT Homework 2 - Solutions

STAT Homework 2 - Solutions STAT-36700 Homework - Solutios Fall 08 September 4, 08 This cotais solutios for Homework. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better isight.

More information

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACATÓLICA Quatitative Methods Miguel Gouveia Mauel Leite Moteiro Faculdade de Ciêcias Ecoómicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACatólica 006/07 Métodos Quatitativos

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Probability and statistics: basic terms

Probability and statistics: basic terms Probability ad statistics: basic terms M. Veeraraghava August 203 A radom variable is a rule that assigs a umerical value to each possible outcome of a experimet. Outcomes of a experimet form the sample

More information

Basis for simulation techniques

Basis for simulation techniques Basis for simulatio techiques M. Veeraraghava, March 7, 004 Estimatio is based o a collectio of experimetal outcomes, x, x,, x, where each experimetal outcome is a value of a radom variable. x i. Defiitios

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State Eco411 Lab: Cetral Limit Theorem, Normal Distributio, ad Jourey to Girl State 1. Some studets may woder why the magic umber 1.96 or 2 (called critical values) is so importat i statistics. Where do they

More information

Quick Review of Probability

Quick Review of Probability Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter 2 & Teachig

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions We have previously leared: KLMED8004 Medical statistics Part I, autum 00 How kow probability distributios (e.g. biomial distributio, ormal distributio) with kow populatio parameters (mea, variace) ca give

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1 PH 425 Quatum Measuremet ad Spi Witer 23 SPIS Lab Measure the spi projectio S z alog the z-axis This is the experimet that is ready to go whe you start the program, as show below Each atom is measured

More information

Asymptotic Results for the Linear Regression Model

Asymptotic Results for the Linear Regression Model Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is

More information

Module 1 Fundamentals in statistics

Module 1 Fundamentals in statistics Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling STATS 200: Itroductio to Statistical Iferece Lecture 1: Course itroductio ad pollig U.S. presidetial electio projectios by state (Source: fivethirtyeight.com, 25 September 2016) Pollig Let s try to uderstad

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

NOTES ON DISTRIBUTIONS

NOTES ON DISTRIBUTIONS NOTES ON DISTRIBUTIONS MICHAEL N KATEHAKIS Radom Variables Radom variables represet outcomes from radom pheomea They are specified by two objects The rage R of possible values ad the frequecy fx with which

More information

The Sample Variance Formula: A Detailed Study of an Old Controversy

The Sample Variance Formula: A Detailed Study of an Old Controversy The Sample Variace Formula: A Detailed Study of a Old Cotroversy Ky M. Vu PhD. AuLac Techologies Ic. c 00 Email: kymvu@aulactechologies.com Abstract The two biased ad ubiased formulae for the sample variace

More information

x = Pr ( X (n) βx ) =

x = Pr ( X (n) βx ) = Exercise 93 / page 45 The desity of a variable X i i 1 is fx α α a For α kow let say equal to α α > fx α α x α Pr X i x < x < Usig a Pivotal Quatity: x α 1 < x < α > x α 1 ad We solve i a similar way as

More information

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov Microarray Ceter BIOSTATISTICS Lecture 5 Iterval Estimatios for Mea ad Proportio dr. Petr Nazarov 15-03-013 petr.azarov@crp-sate.lu Lecture 5. Iterval estimatio for mea ad proportio OUTLINE Iterval estimatios

More information

LECTURE 8: ASYMPTOTICS I

LECTURE 8: ASYMPTOTICS I LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece

More information

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the

More information

STAT Homework 1 - Solutions

STAT Homework 1 - Solutions STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman: Math 224 Fall 2017 Homework 4 Drew Armstrog Problems from 9th editio of Probability ad Statistical Iferece by Hogg, Tais ad Zimmerma: Sectio 2.3, Exercises 16(a,d),18. Sectio 2.4, Exercises 13, 14. Sectio

More information

3 Resampling Methods: The Jackknife

3 Resampling Methods: The Jackknife 3 Resamplig Methods: The Jackkife 3.1 Itroductio I this sectio, much of the cotet is a summary of material from Efro ad Tibshirai (1993) ad Maly (2007). Here are several useful referece texts o resamplig

More information

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n, CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe

More information

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable. Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Lecture 16

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Lecture 16 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Lecture 16 Variace Questio: Let us retur oce agai to the questio of how may heads i a typical sequece of coi flips. Recall that we

More information

Read through these prior to coming to the test and follow them when you take your test.

Read through these prior to coming to the test and follow them when you take your test. Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information