Statistics 3. Revision Notes

Size: px

Start display at page:

Download "Statistics 3. Revision Notes"

Ralph Harrington
5 years ago
Views:

1 Statistics 3 Revisio Notes Jue 016

2 S3 JUNE 016 SDB

3 Statistics 3 1 Combiatios of radom variables... 3 Expected mea ad variace for X ± Y... 3 Remider... 3 Combiig idepedet ormal radom variables Y... 3 Samplig... 4 Methods of collectig data... 4 Takig a cesus... 4 Samplig... 4 Simple radom samplig... 5 Usig radom umber tables... 5 Systematic samplig... 5 Stratified samplig... 6 Samplig with ad without replacemet... 6 Quota samplig... 7 Primary data... 7 Secodary data Biased & ubiased estimators... 8 Ubiased estimators of µ ad σ Estimatig µ ad σ from a sample Cofidece itervals ad sigificace tests Samplig distributio of the mea Cetral limit theorem ad stadard error Cofidece itervals Cetral Limit Theorem Example Sigificace testig variace of populatio kow Mea of ormal distributio Differece betwee meas of ormal distributios Sigificace testig variace of populatio NOT kow, large sample... 1 Mea of ormal distributio... 1 Importat assumptio... 1 Differece betwee meas...

4 5 Goodess of fit, χ test... 3 Geeral poits... 3 Discrete uiform distributio... 3 Cotiuous uiform distributio... 4 Biomial distributio... 4 Poisso distributio... 4 The ormal distributio... 6 Cotigecy tables Regressio ad correlatio... 9 Spearma s rak correlatio coefficiet... 9 Rakig ad equal raks...9 Spearma s rak correlatio coefficiet...9 Spearma or PMCC...30 Testig for zero correlatio Product momet correlatio coefficiet...31 Spearma s rak correlatio coefficiet...31 Compariso betwee PMCC ad Spearma Appedix Combiig radom variables E[X + Y] = E[X] + E[Y]...33 Var[X + Y] = Var[X] + Var[Y]...34 Ubiased & biased estimators Ubiased estimators...34 Biased Estimators...35 Ubiased estimates of populatio mea ad variace Ubiased estimate of the mea...35 Ubiased estimate of the variace of the populatio...36 Bias...37 Probability geeratig fuctios Expected mea ad variace for a p.g.f Mea ad variace of a Biomial distributio...40 Mea ad variace of a Poisso distributio...40 Idex S3 JUNE 016 SDB

5 1 Combiatios of radom variables Expected mea ad variace for X ± Y Remider For ay two radom variables X ad Y E[aX] = ae[x] ad Var[aX] = a Var[X] E[X + Y] = E[X] + E[Y] ad E[X Y] = E[X] E[Y] ad for two idepedet radom variables Var[X + Y] = Var[X] + Var[Y] ad Var[X Y] = Var[X] + Var[Y]. Combiig idepedet ormal radom variables Y If X 1 ad X are idepedet ormal radom variables X 1 ~ N(µ 1, σ 1 ) ad X ~ N(µ, σ ) the X 1 + X ad X 1 X are also ormal radom variables X 1 + X ~ N(µ 1 + µ, σ 1 + σ ) ad X 1 X ~ N(µ 1 µ, σ 1 + σ ) Example: X 1 ad X are idepedet ormal radom variables X 1 ~ N(1, 1) ad X ~ N(9, 6). Fid the expected mea ad stadard deviatio of X 1 X. Solutio: E[X 1 ] =1, Var[X 1 ] = 1 ad E[X ] = 9, Var[X ] = 6 E[X ] = E[X ] = 9 = 18 ad Var[X ] = Var[X ] = 4 6 = 4 E[X 1 X ] = E[X 1 ] E[X ] = 1 18 = 3 ad Var[X 1 X ] = Var[X 1 ] + Var[X ] = = 36 the expected mea ad stadard deviatio of X 1 X are 3 ad 36 = 6. Aswer S3 JUNE 016 SDB 3

6 Example: The weights of empty coffee jars are ormally distributed with mea 0.1 kg ad stadard deviatio 0.0 kg. The weight of coffee i the jars is ormally distributed with mea 1 kg ad stadard deviatio 0.06kg. Fid the distributio of 1 full jars of coffee. What is the probability that 1 full jars weigh more tha 13 5 kg? Solutio: Let X 1, X,... X 1 be the weights of 1 empty jars ad Y 1, Y,... Y 1 be the weights of coffee i the jars. X ~ N(0 1, 0 0 ) ad Y ~ N(1, 0 06 ). Let W be the total weight of 1 full jars the W = X 1 + X X 1 + Y 1 + Y Y 1. The E[W] = 1 E[X] + 1 E[Y] = = 13 ad, assumig idepedece, Var[W] = 1 Var[X] + 1 Var[Y] = = As we are combiig ormal distributios the distributio for 1 full jars is N(13., 0 048). Aswer The probability that 1 full jars weigh more tha 13 5 kg is Φ = 1 Φ(1 37) = to 3 S.F. Aswer. Samplig Methods of collectig data Takig a cesus A cesus ivolves observig every member of a populatio ad is used if the size of the populatio is small or if extreme accuracy is required. Advatages it should give a completely accurate result, a full picture. Disadvatages very time cosumig ad expesive it caot be used whe testig process destroys article beig tested iformatio is difficult to process because there is so much of it. Samplig Samplig ivolves observig or testig a part of the populatio. It is cheaper but does ot give such a full picture. The size of the sample depeds o the accuracy desired (for a varied populatio a large sample will be required to give a reasoable accuracy). 4 S3 JUNE 016 SDB

7 Simple radom samplig Every member of the populatio must have a equal chace of beig selected. Usig radom umber tables To take a simple radom sample of size from a populatio of N samplig uits first make a list ad give each member of the populatio a umber. The use radom umber tables to select the sample. We igore ay umbers which do ot refer to a member of the populatio for example usig three figure radom umbers for a populatio umbered from 001 to 659 we would igore umbers from 660 to 999. Also we igore the secod occurrece of the same umber. Advatages the umbers are truly radom ad free from bias it is easy to use each member has a kow equal chace of selectio Disadvatages it is ot suitable whe the sample size is large. Lottery samplig A samplig frame is eeded idetifyig each member of the populatio. The ame or umber of each member is writte o a ticket (all the same size, colour ad shape), ad the tickets are all put i a cotaier which is the shake. Tickets are the draw without replacemet. Advatages the tickets are draw at radom. it is easy to use. each ticket has a kow chace of selectio (cosidered as costat as log as the sample size is much smaller tha the total umber of tickets). Disadvatages it is ot suitable for a large sample a samplig frame is eeded. Systematic samplig First make a ordered list, ad divide ito equal groups each of size 50 (or??). Secod select every 50 th (or??) member from the list. I order to make sure that the first o the list is ot automatically selected radom umber tables must be used to select the member i the first group, the select every 50 th (or??) after that. Used whe the populatio is too large for simple radom umber samplig. Advatages simple to use suitable for large samples Disadvatages oly radom if the ordered list is truly radom. it ca itroduce bias S3 JUNE 016 SDB 5

8 Stratified samplig First divide the populatio ito exclusive (distict) groups or strata ad the select a sample so that the proportio of each stratum i the sample equals the proportio of that stratum i the populatio. Example: How would you take a stratified sample of 50 childre from a school of 500 pupils divided as follows: Boys Girls Upper sixth Lower sixth Fifth form Fourth form Third form Solutio: As 50 is 1 / 10 of the total populatio, 1 / 10 of each stratum should be selected i the sample. Thus the sample would comprise Boys Girls Upper sixth 3 4 Lower sixth 3 3 Fifth form 7 6 Fourth form 6 7 Third form 5 6 ad simple radom umber samplig would be used withi each stratum. Used whe the sample is large the populatio divides aturally ito mutually exclusive groups. Advatages it ca give more accurate estimates (or a more represetative picture) tha simple radom umber samplig whe there are clear strata preset. It reflects the populatio structure. Disadvatages withi the strata the problems are the same as for ay simple radom sample if the strata are ot clearly defied they may overlap. Samplig with ad without replacemet Simple radom samplig is samplig without replacemet i which each member of populatio ca be selected at most oce. I samplig with replacemet each member of the populatio ca be selected more tha oce: this is called urestricted radom samplig. 6 S3 JUNE 016 SDB

9 Quota samplig This is a o-radom method. First decide o groups ito which the populatio is divided ad a umber from each group to be iterviewed to form quotas. The go out ad iterview ad eter each result ito the relevat quota. If someoe refuses to aswer or belogs to a quota which is already full the igore that persos reply ad cotiue iterviewig util all quotas are full. Used whe it is ot possible to use radom methods - for example whe the whole populatio is ot kow (homeless i a big city). Advatages ca be doe quickly as a represetative sample ca be obtaied with a small sample size costs are kept to a miimum admiistratio is fairly easy. Disadvatages it is ot possible to estimate the samplig errors (as it is ot a radom process) iterviewer may ot put ito correct quota o-resposes are ot recorded it ca itroduce iterviewer bias Primary data Primary data is data collected by or o behalf of the perso who is goig to use the data. Advatages collectio method is kow accuracy is kow exact data eeded are collected Disadvatages costly i time ad effort Secodary data Secodary data is data ot collected by or o behalf of the perso who is goig to use it. The data are secod-had e.g. govermet cesus statistics. Advatages cheap to obtai large quatity available (e.g. iteret) much has bee collected year o year ad ca be used to plot treds Disadvatages collectio method may ot be kow accuracy may ot be kow it ca be i a form which is difficult to hadle bias is ot always recogised. S3 JUNE 016 SDB 7

10 3 Biased & ubiased estimators Example: A bag cotais a large umber of cois, of which are p cois ad 3 are 5p cois. 5 5 (a) X is the value of a sigle coi draw from the bag. Fid the expected mea of all cois i the bag, µ = E[X]. Samples of size 3 are ow draw from the bag. (b) Fid the samplig distributio of ad the expected value of (i) the media, ad (ii) the mea. (c) (i) The media, Q, is used as a estimator of the mea of all the cois, µ. Show that Q is a biased estimator of µ, ad fid the bias. (ii) The mea, X, is used as a estimator of the mea of all the cois, µ. Show that X is a ubiased estimator of µ. (d) kq is ow used as a ubiased estimator of the mea of all the cois. Fid the value of k. Solutio: (a) µ = E[X] = x i p i = = 3 8 (b) Sample Probability media mea (,,) = 15 (,,5), (,5,), (5,,) = (,5,5), (5,,5), (5,5,) = (5,5,5) = Samplig distributio (i) Media (ii) Mea Q = x i p i x i p i X = x i p i x i p i E[Q ] = = E[X] = = S3 JUNE 016 SDB

11 Ubiased estimator: If X (usually foud from a sample) is used to estimate the value of a populatio parameter, t, the X is a ubiased estimator of t if E[X] = the true value of the parameter t. Bias: (c) If a estimator, X, is biased, the the bias is the differece betwee E[X] ad the true value of the parameter t. (i) The media Q is used as the estimate of the mea. From part (a) we kow that the true value of the mea µ is 3 8, ad i part (b) we have show that E[Q ] = Q is a biased estimator of µ, ad the bias is E[Q ] true value of µ = = (ii) The mea X is used as the estimate of the mea. From part (a) we kow that the true value of the mea µ is 3 8, ad i part (b) we have show that E[X] = 3 8 E[X] = the true value of the mea X is a ubiased estimator of µ, (d) If we ow use kq as a ubiased estimator of the mea value of all the cois. E[kQ ] = k E[Q ] = k But the true mea µ = If kq is a ubiased estimator of µ, E[kQ ] = true value of µ k = k = Example: A sample of size 3 is draw from a biomial distributio B(10, 0 5) ad the mea, X, is calculated. The probability of success, p, is estimated by p = 1 X. Show that p is a ubiased estimator of p. 10 Solutio: E[X] = p = = 5 For a sample {X 1, X, X 3 }, X = 1 (X X + X 3 ) E X = E 1 (X X + X 3 ) = 1 (E[X 3 1] + E[X ] + E[X 3 ]) E X = = 5 sice E[X 3 i] = E[X] = 5, for i = 1,, 3 E[p ] = E 1 10 X = 1 10 E X = = 0 5, which is the true value of p p = 1 X is a ubiased estimator of p. 10 S3 JUNE 016 SDB 9

12 Example: A sample of size 4 is draw from a cotiuous uiform distributio, U[3, β ]. The mea of the sample, X, is calculated. The upper limit, β, is estimated by β = X 3. Show that β is a ubiased estimator of β. Solutio: E[X] = 1 (3 + β ) For a sample {X 1, X, X 3, X 4 }, X = 1 (X X + X 3 + X 4 ) E X = E 1 (X X + X 3 + X 4 ) = 1 (E[X 4 1] + E[X ] + E[X 3 ] + E[X 4 ]) E X = 1 4 E[X] = E[X] sice E[X 4 i] = E[X], for i = 1,, 3, 4 E X = 1 (3 + β ) E β = E X 3 = E X 3 = 1 (3 + β ) 3 = β, E β = β which is the true value of the (ukow) upper limit, β. β = X 3 is a ubiased estimator of β. There is more o biased ad ubiased estimators i the Appedix. 10 S3 JUNE 016 SDB

13 Ubiased estimators of µ ad σ Estimatig µ ad σ from a sample We usually do ot kow the mea, µ, ad the variace, σ, of a populatio. To estimate these values we take a sample {X 1, X, X 3,, X } of size ad calculate the sample mea, X = 1 X i, ad (sd) x = 1 X i X = 1 (X X ) these ca be compared with the formulae for populatio variace from the S1 module. It ca be show that E X = the true value of µ X = μ is a ubiased estimator of the populatio mea µ. It ca be show that E (sd) x = 1 σ. I (sd) x is a biased estimator of σ. I E 1 (sd) x = 1 1 σ = σ, the true value of the variace 1 (sd) x = σ is a ubiased estimator of the populatio variace, σ. (Proofs of these results are give i the Appedix.) Note: the Edexcel course uses both the letters S ad s x to mea the ubiased estimate of σ. Also, the term Sample Variace is used to deote the ubiased estimate of σ, the variace of the populatio. I these otes I shall always thik of the variace, (sd) x, as 1 X i X = 1 (X X ) To fid S or s x, the ubiased estimator for σ : Calculate (sd) x, ad the multiply by 1 S3 JUNE 016 SDB 11

14 Example: The weights of a sample of five chocolate bars produced by a machie were 56, 53, 57, 51 ad 54 grams. Fid ubiased estimators for the weight of all chocolate bars produced by that machie. Solutio: X X X (X X ) X = 1 71 X = = 54 5 (sd) x = 1 (X X ) = 8 = 4 56 σ = 1 (sd) x = = Aswer Ubiased estimators for the mea ad variace of all chocolate bars are 54 grams ad 5 7 grams. Example: The volume of water i each of a sample of 14 litre bottles of water from a day s productio is take. The results are show below, i ml. 103, 1019, 1004, 1011, 103, 1014, 1017, 100, 100, 1010, 105, 1007, 1016, 1019 Fid ubiased estimates for the mea ad variace of all bottles produced o that day. Solutio: First fid the sample mea, X, = = (fidig X X each time) would give upleasat arithmetic, so use (sd) x = 1 X X X = (sd) x = S = s x = 1 (sd) x = = = Aswer Ubiased estimators for the mea ad variace of the whole day s productio are ml ad ml. 1 S3 JUNE 016 SDB

15 Example: The weights of a sample of 15 packets of biscuits are recorded ad give the followig results. Σ X = 3797 grams, ad Σ X = Fid ubiased estimators for the mea ad variace of all biscuits produced by this process. Solutio: µ = X = = = 5 1 grams. (sd) x = 1 X X = = σ = = = grams. Aswer Ubiased estimators are μ =5 1 g, ad σ = g. Example: The legths of 10 rods are measured, ad the sample has mea, X = 6 7 cm ad variace s = 76 9 cm. A eleveth rod has legth 30 cm. Fid (a) the mea ad (b) the variace of the sample of 11 rods. Solutio: (a) With the sample mea there are o complicatios. 10 For = 10, X 10 = 1 10 X i = 6 7 X i = 67 For = 11, 11 i=1 10 i=1 X i = = 97 X 11 = 1 11 X i i=1 11 i=1 = = 7 cm (b) WARNING: The questio refers to the variace of the sample, which meas the ubiased estimate of the variace of the populatio. 10 s 10 = 76 9 = (sd) (10 1) 10 (sd) 9 10 = 76 9 = (sd) 10 = X i X 10 = i=1 X i = = 781 i=1 11 with extra rod X i = = 871 i=1 (sd) 11 = X i s 11 = i=1 871 X 11 = 11 7 = (11 1) (sd) 11 = = 70 cm For 11 rods, sample mea is 7 cm, ad sample variace is 70 cm. S3 JUNE 016 SDB 13

16 4 Cofidece itervals ad sigificace tests Samplig distributio of the mea X is a radom variable draw from a populatio with mea µ ad stadard deviatio σ. If {X 1, X,..., X } is a radom sample of size with mea X = the E[X i ] = µ, ad Var[X i ] = σ, for i = 1,, 3,, X 1 + X X ad the expected mea of the populatio of sample meas is E X = X 1 + X X E 1 = ( E[ X ] + E[ X ] +... E[ X ]) = ( µ + µ... µ ) = µ Also the expected variace of the populatio of sample meas is Var X = X 1 + X X Var 1 1 = ( Var[ X ] + Var[ X ] +... Var[ X ]) = ( σ + σ σ ) 1... assumig that all the X i are idepedet = σ = σ This meas that if very may samples were take ad the mea of each sample calculated the the σ mea of these meas would be µ ad the variace of these meas would be. It ca also be show that the sample meas form a Normal distributio (provided that is large eough ). We ca the say that for samples draw from a populatio with mea µ ad variace σ, the samplig distributio of the mea is N(µ, σ ). 14 S3 JUNE 016 SDB

17 Cetral limit theorem ad stadard error The cetral limit theorem states that If {X 1, X,..., X } is a radom sample of size draw from ay populatio with mea µ ad variace σ the the populatio of sample meas (i) has expected mea µ (ii) (iii) σ has expected variace forms a ormal distributio if is large eough. σ i.e. X ~ N µ,. The cetral limit theorem is used for samplig whe the sample size is large (> about 50) as the populatio of sample meas is the approximately ormal whatever the distributio of the origial populatio. σ The stadard error of the sample mea is. Example: A sample of size 50 is take from a populatio of eggs with mea 3 4 grams ad variace 36 grams. (i) Solutio: Fid the probability that a sigle egg weighs more tha 5 grams. (ii) Fid the probability that the sample mea is larger tha 5. (iii) What assumptios did you make? (i) The weight of a sigle egg, X N(3 4, 6 ) P(X > 5) = Φ = Φ(0 7) = (ii) µ = 3 4, σ = 36 The sample mea X N 3 4, 6 50 stadard error is X ~ N(3 4, ) P(X > 5) = 1 Φ σ = 6 = = 1 Φ(1 89) = (from Normal tables) = (iii) We have assumed the Cetral Limit Theorem: i particular that the sample meas form a ormal distributio. S3 JUNE 016 SDB 15

18 Cofidece itervals Cetral Limit Theorem Example Example: Solutio: A biscuit maufacturer makes packets of biscuits with a omial weight of 50 grams. It is kow that over a log period the variace of the weights of the packets of biscuits produced is 5 grams. A sample of 10 packets is take ad foud to have a mea weight of 53 4 grams. Fid 95% cofidece limits for the mea weight of all packets produced by the machie. First assume that the machie is still producig packets with the same variace, 5. Suppose that the mea weight of all packets of biscuits is µ grams the the populatio of all packets has mea µ ad stadard deviatio 5. From the cetral limit theorem we ca assume that the sample meas form a approximately σ 5 ormal populatio with mea µ ad stadard error (stadard deviatio) = = % of the samples will have a mea i the regio 1 96 < Z < 1 96 f(x) 95% We assume that the mea of this sample, 53 4, lies i this regio x < 53 4 μ < 53 4 μ < ad 53 4 μ < µ < 53 4 ad 53 4 < µ µ < ad < µ < µ < < µ < 56 5 This meas that 95% of the samples will give a iterval which cotais the mea ad we say that [50 3 g, 56 5 g] is a 95% cofidece iterval for µ. This meas that there is a 0 95 probability that this iterval cotais the true mea. It does ot mea that there is a probability of 0 95 that the true mea lies i this iterval - the true mea is a fixed umber, ad either does or does ot lie i the iterval so the probability that the true mea lies i the iterval is either 1 or S3 JUNE 016 SDB

19 I practice we go straight to the last lie of the example: 95% cofidece limits are µ ± σ sice P(Z < z < ) = 0 95 tables give P(Z > ) = % cofidece limits are µ ± σ sice P(Z < z < ) = 0 90 tables give P(Z > ) = 0 05 Other cofidece limits ca be foud usig the Normal Distributio tables. Example: A sample of 64 packets of corflakes has a mea weight X = 510 grams ad a variace S = 36 grams. Fid 90% cofidece limits for the mea weight of all packets. (Note that the sample variace is take as the ubiased estimate of σ.) Solutio: We assume that the sample variace = the variace of the populatio of all packets S = 36 = σ. Now fid stadard deviatio (stadard error) of the samplig distributio of the mea (populatio σ 6 of sample meas), stadard error = = = For 90% cofidece limits z = ± usig the sample mea X = 510 grams (remember to use the 4 D.P. tables after the Normal Dist. tables), 90% cofidece limits are 510 ± = 510 ± 1 34 a 90% cofidece iterval is [508 8, 511 ] to 4 S.F. Note that we have assumed that the ubiased estimate, S (=36), is the actual variace, σ, of the populatio. This is a reasoable assumptio as the umber i the sample, 64, is large ad the error itroduced is therefore small. S3 JUNE 016 SDB 17

20 Sigificace testig variace of populatio kow Mea of ormal distributio Example: Solutio: A machie, whe correctly set, is kow to produce ball bearigs with a mea weight of 84 grams with a stadard deviatio of 5 grams. The productio maager decides to test whether the machie is workig correctly ad takes a sample of 10 ball bearigs. The sample has mea weight 83. grams. Would you advise the productio maager to alter the settig of his machie? Use a 5% sigificace level. 1) H 0 : µ = 84 grams ) H 1 : µ 84 grams tail test (Note that the machie is ot workig correctly if the test result is too high or too low) 3) 5% Sigificace level 4) The Test We assume that the machie is still workig with a stadard deviatio of σ = 5 g. From H 0, the mea weight of all ball bearigs is assumed to be µ = 84 g. These are the parameters for the populatio of all ball bearigs. We wat to test a sample mea ad therefore eed the mea ad stadard deviatio of the populatio of sample meas (the samplig distributio of the sample mea, X ). Expected mea of the sample meas = µ = 84 g. ad expected stadard deviatio of the sample meas = stadard error =. We have a observed mea of 83 For a two-tailed test at 5%, we take 5% at each ed P(X < 83 ) = ( ) Φ =Φ ( 1 757) = (1 Φ(1 75)) = = 4 01% > 5% ad so ot sigificat at the 5% level. 83 σ 5 = = x 5) Coclusio Do ot reject H 0 at the 5% level ad advise the productio maager that there is evidece that he should ot chage his settig, or that there is evidece that the machie is workig correctly, etc. 18 S3 JUNE 016 SDB

21 Differece betwee meas of ormal distributios Suppose that X ad Y are two idepedet radom variables from differet ormal distributios X ~ N(µ x, σ x ) ad Y ~ N(µ y, σ y ). If samples of sizes x ad y are draw from these populatios the the distributios of the sample meas, X ad Y will be ormal X ~ N μ x, σ x x ad Y ~ N μ y, E[X Y ] = E[X ] E[Y ] ad Var[X Y ] = Var[X ] + Var[Y ] the differeces of the sample meas, X Y, will be ormal σ y y σ x (X Y ) ~ N μ x μ y, + σ y x y Example: The weights of chocolate bars produced by two machies, A ad B, are kow to be ormally distributed with variaces σ A = 4 ad σ B = 3 grams. Samples are take from each machie of sizes A = 5 ad B = 16 which have meas X A = 13 1 ad X B = 14 4 grams. Is there ay evidece at the 5% sigificace level that the bars produced by machie B are heavier tha the bars produced by machie A? Solutio: Suppose that the mea weights for all bars from the two machies are µ A ad µ B H 0 : µ A = µ B H 1 : µ B > µ A oe-tail test at 5% level The test statistic is the observed differece betwee sample meas, X B X A = = 1 3, ad we must fid the variace of this populatio of differeces of sample meas (the samplig distributio of differeces of sample meas). Cosider the populatio of differeces of sample meas Firstly, for the populatio of sample meas for machie B σ B 3 expected variace Var[ X B ] = = 16 B X B X A. S3 JUNE 016 SDB 19

22 ad secodly, for the populatio of sample meas for machie A σ A 4 expected variace Var[ X A ] = = 5 A ad so for the populatio of differeces of sample meas expected mea = E[ X B X A ] = µ A - µ B = 0 (from H 0 ) ad Var[ X B X A ] = Var[ X B ] + Var[ X A ] σ B + σ A B = = 3 / / 5 = A The observed differece, the test statistic, is = 1 3 ad the stadard error is ) The Cetral Limit Theorem tells us that we have a Normal distributio P(differece > 1 3) = Φ = 1 Φ( 053) ) = 1 Φ( 0) = = = 1 34% < 5% sigificat at 5% level so reject H 0 ad coclude that there is evidece that machie B is producig bars of chocolate with a heavier mea weight tha machie A. Fortuately (!) the formula for testig the differece betwee sample meas Z = X Y (μ x μ y ) σ x x + σ y y is i your formula booklet. 0 S3 JUNE 016 SDB

23 Sigificace testig variace of populatio NOT kow, large sample Whe the variace of the populatio, σ, is ot kow ad whe the sample is large, we assume that the variace of the sample (meaig the ubiased estimate of σ ), S, is the variace of the populatio, σ. As the sample is large, the error itroduced is small. Mea of ormal distributio Example: A machie usually produces steel rods with a mea legth of 5 4 cm. The productio maager wats to test 80 rods to see whether the machie is workig correctly. The sample has mea 5 31 cm ad variace 0 33 cm. Advise the productio maager, usig a 5% level of sigificace. Importat assumptio The sample variace, S, is take as, σ, the ubiased estimate of the variace of the populatio, σ, ad we the assume that the populatio variace equals the ubiased estimate. Solutio: H 0 : µ = 5 4. H 1 : µ 5 4 two-tail test, 5% i each tail We assume that populatio variace σ = the sample variace S = 0 33 σ = 0 33 For the populatio of sample meas (the samplig distributio of the sample meas) expected mea = 5 4 from hypothesis σ 0 33 ad stadard error = = = The observed sample mea is 5.31 ad for a two-tail test at 5% we cosider Φ = Φ( 4393) = 1 Φ( 44) = < 5% reject H 0 ad coclude that there is evidece that that the machie is ot producig rods of mea legth 5 4 cm. S3 JUNE 016 SDB 1

24 Differece betwee meas Example: Solutio: A firm has two machies, A ad B, which make steel cable. 40 cables produced by machie A have a mea breakig strai of 178 N ad variace of 75 N, whereas 65 cables produced by machie B have a mea breakig strai of 1757 N ad a variace of 63 N. Is there ay evidece, at the 10% level, to suggest that machie B is producig stroger cables tha machie A? Let μ A ad μ B be the mea breakig stregths of all cables produced by machies A ad B. 1) H 0 : μ A = μ B ) H 1 : μ B > μ A 1 tail test 3) Sigificace Level 10%. 4) The Test For Machie A We assume that the populatio variace, σ A = the sample variace, S = 75 variace of sample meas Var[ X A ] = σ 75 = = A 40 For Machie B We assume that the populatio variace, σ B = the sample variace, S = 63 variace of sample meas Var[ X B ] = For differeces i sample meas Expected mea = 0 Expected variace is Var[ X B X A σ 63 = = B 65 from hypothesis X B X A ] = Var[ X B ] + Var[ X A ] = = stadard deviatio or stadard error = = We have a observed differece i meas, test statistic, X B X A = = 9 ad for a 1-tail test that B is stroger we eed the area to the right of 9 mea is treated as cotiuous, so do ot use 8.5 = 1 Φ 9 0 = 1 Φ( 04) = < 10% which is sigificat at 10%. A B 5) Coclusio Reject H 0 at the 10% level ad coclude that there is evidece that machie B produces cables with a greater mea stregth tha machie A. S3 JUNE 016 SDB

25 5 Goodess of fit, χ test Geeral poits The χ test ca oly be used to test two lists of frequecies the observed ad the expected frequecies calculated from the hypothesis. The expected frequecies do ot eed to be itegers (give D.P.) χ ( O = i Ei ) Ei, where O i ad E i are the observed ad expected frequecies. If the expected frequecy for a class is less tha 5, the you must group this class with the ext class (or two ). The umber of degrees of freedom, ν, is the umber of cells (after groupig if ecessary) mius the umber of liear equatios coectig the frequecies. Discrete uiform distributio Example: A die is rolled 300 times ad the frequecy of each score recorded. Score: Frequecy: Test whether the die is fair at the 5% level of sigificace. Solutio: H 0 : The die is fair, the probability of each score is 1 / 6. H 1 : The die is ot fair, the probability of each score is ot 1 / 6. The expected frequecies are all 1 / = 50 ad we have Score Observed frequecy Expected frequecy ( O E ) i E i i Totals χ = 3 04 ad ν = umber of degrees of freedom = 1 = 6 1 = 5 sice the total is a liear equatio coectig the frequecies ad is fixed. From tables we see that χ5 ( 5%) = 1 83 > 3 04, so our observed result is ot sigificat. We do ot reject H 0 ad coclude that the die is fair. S3 JUNE 016 SDB 3

26 Cotiuous uiform distributio This is very similar to the discrete uiform distributio pay attetio to the class boudaries ad fid the expected frequecies. Biomial distributio For H 0 The Biomial distributio is a good fit we use the mea of the Observed frequecies to calculate the Expected frequecies, ad so both O i ad E i give the same mea ad total: thus there are liear equatios coectig the frequecies ad ν = but For H 0 The Biomial distributio, B(30, 0 3), is a good fit the meas usig Oi ad E i will be differet: thus there is oly 1 liear equatio, the total, coectig the frequecies ad so ν = 1. Poisso distributio For H 0 The Poisso distributio is a good fit we use the mea of the Observed frequecies to calculate the Expected frequecies, ad so both O i ad E i give the same mea ad total: thus there are liear equatios coectig the frequecies ad ν = but For H 0 The Poisso distributio, P o (3), is a good fit the meas usig Oi ad E i will be differet: thus there is oly 1 liear equatio, the total, coectig the frequecies ad so ν = 1. Example: A switchboard operator records the umber of ew calls i 69 cosecutive oe-miute periods i the table below. umber of calls frequecy a) Say why you thik that a Poisso distributio might be suitable. b) Fid the mea ad variace of this distributio. Do these figures support the view that they might form a Poisso distributio? c) Test the goodess of fit of a Poisso distributio at the 5% level. Solutio: a) Telephoe calls are likely to occur sigly, radomly, idepedetly ad uiformly which are the coditios for a Poisso distributio. b) Treatig 6 as 7 we calculate the mea ad variace x f xf x f S3 JUNE 016 SDB

27 mea = 15 / 69 = 3 1 ad variace = 915 / 69 ( 15 / 69 ) = From these figures we ca see that the mea ad variace are approximately equal: sice the mea ad variace of a Poisso distributio are equal this cofirms the view that the distributio could be Poisso. c) H 0 : The Poisso distributio is a suitable model H 1 : The Poisso distributio is ot a suitable model. The Poisso probabilities ca be calculated from P(r) = expected frequecies by multiplyig by N = 69. r λ λ e where λ = 3 1, ad the r! Note that the probability for 6 is foud by addig the other probabilities ad subtractig from 1. x O p E O (grouped) E (grouped) ( O E) E The expected frequecy for x = 0 is 3.06 < 5 so it has bee grouped with x = 1. Thus we have = 6 classes (after groupig) ad ν = = 4 ad χ (5%) = We have calculated χ = 1.9 < which is ot sigificat so we do ot reject H 0 ad coclude that the Poisso distributio is a suitable model. S3 JUNE 016 SDB 5

28 The ormal distributio For H 0 The Normal distributio is a good fit we use the mea ad variace of the Observed frequecies to calculate the Expected frequecies, ad so both O i ad E i give the same mea, variace ad total: thus there are 3 liear equatios coectig the frequecies ad ν = 3 but For H 0 The Normal distributio, N(14, 3 ), is a good fit the meas ad variaces usig Oi ad E i will be differet: thus there is oly 1 liear equatio, the total, coectig the frequecies ad so ν = 1. Example: The sizes of me s shoes purchased from a shoe shop i oe week are recorded below. size of shoe umber of pairs Is the maager s assumptio that the ormal distributio is a suitable model justified at the 5% level? Solutio: H 0 : The ormal distributio is a suitable model H 1 : The ormal distributio is ot a suitable model. The total umber of pairs, mea ad stadard deviatio are calculated to be 175, ad (takig 6 as 5 ad 1 as 1) Rememberig that size 8 meas from 7.5 to 8.5 we eed to fid the area betwee 7.5 ad 8.5 ad multiply by 175 to fid the expected frequecy for size 8, ad similarly for other sizes. x x m Φ(z) class area = p E = 175p O z = s ( O E) E < to = to = to = to = to = > = = 7 classes & 3 liear equatios coectig the frequecies (N, m,s) ν = 3 = 4. χ 4 (5%) = ad we have calculated χ = 5.34 < ad so we do ot reject H 0 ad therefore coclude that the ormal distributio is a suitable model. S3 JUNE 016 SDB

29 Cotigecy tables For a 5 4 table i which the totals of each row ad colum are fixed the? cells represet the degrees of freedom sice if we kow the values of the?s the frequecies i the other cells ca ow be calculated A B C D E totals W???? X???? Y???? Z totals Thus there are (5 1) (4 1) = 1. Geeralisig we ca see that for a m table the umber of degrees of freedom is (m 1)( 1). Example: Natives of Eglad, Africa ad Chia were classified accordig to blood group givig the followig table. O A B AB Eglish Africa Chiese Is there ay evidece at the 5% level that there is a coectio betwee blood group ad atioality? Solutio: H 0 : There is o coectio betwee blood group ad atioality. H 1 : There is a coectio betwee blood group ad atioality. First redraw the table showig totals of each row ad colum O A B AB totals Eglish Africa Chiese totals S3 JUNE 016 SDB 7

30 Now we eed to calculate the expected frequecy for Eglish ad group O. There are 609 Eglish ad 1335 people altogether so 609 / 1335 of the people are Eglish, ad from H 0 we kow that there is o coectio betwee blood group ad atioality, so there should be 609 / 1335 of those with group O who are also Eglish expected frequecy for Eglish ad group O is 544 = = this ca become automatic if you otice that you just multiply the totals for the row ad colum cocered ad divide by the total umber Eglish = 48. O A B AB totals = = = Africa = = = = Chiese = = = = totals The value of χ is calculated below 8 Observed Expected ( O E) frequecy frequecy E We have ν = (4 1)(3 1) = 6 degrees of freedom ad χ (5%) = We have calculated χ = 8.41 < 1.59 do ot reject H 0 ad therefore coclude that there is o coectio betwee atioality ad blood group. S3 JUNE 016 SDB

31 6 Regressio ad correlatio Spearma s rak correlatio coefficiet Rakig ad equal raks Rakig is puttig a list of figures i order ad givig each oe its positio or rak. Equal umbers are give the average of the raks they would have had if all had bee differet. Example: Rak the followig umbers: 45, 65, 76, 56, 34, 45, 3, 67, 65, 45, 81, 3. Solutio: First put i order ad give raks as if all were differet: the give the average rak for those which are equal. Numbers: Actual rak 1 3 4= 4= 6 7= 7= 7= Rak (if all differet) average for equal raks = 4 3 = 8 Modified rak 1 3 4½ 4½ You must ow calculate the PMCC, ot Spearma, usig the modified raks. Spearma s rak correlatio coefficiet To compare two sets of rakigs for the same items, first fid the differece, d, betwee each pair of raks ad the calculate Spearma s rak correlatio coefficiet 6 d r s = 1 ( 1) This is the same as the product momet correlatio coefficiet of the two sets of raks ad so we kow that r s = +1 meas rakigs are i perfect agreemet, r s = 1 meas rakigs are i exact reverse order, r s = 0 meas that there is o correlatio betwee the rakigs. S3 JUNE 016 SDB 9

32 Example: Te varieties of coffee labelled A, B, C,..., J were tasted by a ma ad a woma. Each raked the coffees from best to worst as show. Ma: G H C D A E B J I F Woma: C B H G J D I E F A Fid Spearma s rak correlatio coefficiet. Solutio: Rak for each perso, fid d ad the r s. Coffee Ma Woma d d A B C D E F G H I J d 6 86 r s = 1 = 1 = = 0.51 to 3 S.F. ( 1) Spearma or PMCC Use of Spearma s rak correlatio coefficiet (i) Use whe oe, or both, sets of data are ot from a ormal populatio. (ii) Use whe the data does ot have to be measured o scales or i uits (probably ot ormal). (iii) Use whe data is subjective e.g. judgemets i order of preferece (ot ormal). (iv) Ca be used if the scatter graph idicates a o-liear relatioship betwee the variables, sice the PMCC is used to idicate liear correlatio. (v) Do ot use for tied raks (Spearma formula depeds o o-tied raks). Use of Product momet correlatio coefficiet (i) (ii) (iii) Use whe raks are tied see above: modify the raks ad the use PMCC o the modified raks. Use whe both sets of figures are ormally distributed (this will ot be the case whe usig raks). Use whe the scatter diagram idicates a liear relatioship betwee the variables i.e. whe the poits lie close to a straight lie. 30 S3 JUNE 016 SDB

33 Testig for zero correlatio N.B. the tables give figures for a ONE-TAIL test Product momet correlatio coefficiet PMCC tests to see if there is a liear coectio betwee the variables. For strog correlatio, the poits o a scatter graph will lie close to a straight lie. Remider: PMCC = ρ = where S xx = x i S S xx xy S yy ( xi ), S yy = y i ( yi ), ( xi )( yi ) S xy = xi yi. Example: The product momet correlatio coefficiet betwee 40 pairs of values is Is there ay evidece of correlatio betwee the pairs at the 5% level? Solutio: H 0 : There is o correlatio betwee the pairs, ρ = 0. H 1 : There is correlatio, positive or egative, betwee the pairs, ρ 0, two-tail test From tables for = 40 which give oe-tail figures, we must look at the.5% colum ad the critical values are ±0.310 The calculated figure is 0.5 > ad so is sigificat we reject H 0 ad coclude that there is some correlatio (positive or egative) betwee the pairs. Spearma s rak correlatio coefficiet Spearma tests to see if there is a coectio (or correlatio) betwee the raks. Example: It is believed that a perso who absorbs a drug well o oe occasio will also absorb a drug well o aother occasio. Tests o te patiets to fid the percetage of drug absorbed gave the followig value for Spearma rak correlatio coefficiet, r s = Is there ay evidece at the 5% level of a positive correlatio betwee the two sets of results. Solutio: H 0 : There is o correlatio betwee the two sets of results, ρ s = 0, H 1 : There is positive correlatio betwee the two sets of results, ρ s > 0, oe-tail test. From the tables for = 10 ad a oe-tail test the critical value for 5% is The calculated value is > which is sigificat reject H 0 ; coclude that there is evidece of positive correlatio betwee the two sets of results. Note that this shows correlatio betwee the raks of the two sets of results. S3 JUNE 016 SDB 31

34 Compariso betwee PMCC ad Spearma Example: A radom sample of 8 studets sat examiatios i Geography ad Statistics. The product momet correlatio coefficiet betwee their results was 0 57 ad the Spearma rak correlatio coefficiet was (a) Test both of these values for positive correlatio. Use a 5% level of sigificace. (b) Commet o your results. Solutio: (a) H 0 : ρ = 0 ; H 1 : ρ > 0 For the PMCC the 5% Critical Value is < ot sigificat at %5 there is evidece that there is o positive correlatio. For Spearma s rak correlatio coefficiet the 5% Critical Value is > sigificat at 5% there is evidece of positive correlatio. (b) From the PMCC there is ot eough evidece to coclude that as Statistics marks icreased Geography marks also icreased i.e. coclude that the poits o a scatter diagram do ot lie close to a straight lie. From Spearma s rak correlatio coefficiet there is evidece that studets raked highly i Statistics were also raked highly i Geography, or people with high scores i Statistics also had high scores i Geography 3 S3 JUNE 016 SDB

35 7 Appedix Combiig radom variables Let X ad Y be radom variables with probability distributios X { X 1, X, X 3,, X } with probabilities (p 1, p, p 3,, p ), ad Y { Y 1, Y, Y 3,, Y m } with probabilities (q 1, q, q 3,, q m ), The the radom variable X + Y is all possible combiatios x i + y j as i varies from 1 to ad j varies from 1 to m. Let P(x i + y j ) = r ij. Notice that ad, similarly, m r ij j=1 r ij i=1 = r i1 + r i + r i3 + + r im = p i = q j E[X + Y] = E[X] + E[Y] m E[X + Y] = x i + y j r ij i=1 j=1 = x i r ij i j + y j r ij j i = x i r ij i j + y j r ij j i m = x i p i i=1 + y j q j j=1 E[X + Y] = E[X] + E[Y] S3 JUNE 016 SDB 33

36 Var[X + Y] = Var[X] + Var[Y] I this case we take X ad Y to be idepedet, r ij = P(x i ad y j ) = P(x i ) P(y j ) = p i q j. Also otice that p i = q j = 1 Var[X + Y] = E[(X + Y) ] (E[X + Y]) = x i + y j r ij i j (E[X] + E[Y]) = x i p i q j + x i y j p i q j + y j p i q j ((E[X]) + E[X]E[Y] + (E[Y]) ) i j i j i j = x i p i q j + x i p i y j q j + y j q j p i (E[X]) E[X]E[Y] (E[Y]) i j i j j i = x i p i + E[X]E[Y] + y j q j (E[X]) E[X]E[Y] (E[Y]) i j = E[X ] (E[X]) + E[Y ] (E[Y]) Var[X + Y] = Var[X] + Var[Y], if X ad Y are idepedet. Ubiased & biased estimators Ubiased estimators A estimator λ for a parameter λ is said to be ubiased if E[λ ] = λ. Example: A bag has 468 beads of two colours, white ad gree. 0 beads are take at radom ad the umber, i, of gree beads i the sample is couted. To estimate the true umber of gree beads, g, i the bag, we calculate i g = If g is the true umber of gree beads i the bag the the probability of drawig a gree bead i a sigle trial is p = ad drawig = 0 beads with replacemet gives a Biomial distributio B (, p). Thus µ = E[i] = p = 0 g 468 We do ot actually kow the umber of gree beads, ad wat to estimate this umber after takig oe sample i estimate g = g 468, 34 S3 JUNE 016 SDB

37 We ow fid the expected value of this estimate E[g ] = E i = E[i] = g 468 = g, the true umber the expected value of the estimator, g, is equal to the true value, g the estimator, g, is ubiased. Biased Estimators A estimator λ for a parameter λ is said to be biased if E[λ ] λ. Example A aturalist wishes to estimate the umber of squirrels i a wood. He first catches 50 squirrels, marks them ad the releases them. Later he catches 30 squirrels ad couts the umber, i, which have bee marked. The true umber i the populatio,, is the estimated as from the equatio 50 = i 30 Now E[ ] = 30 0 = 1500 i 1500 i p i. i.e. it is possible that i = 0, i which case the estimate is ifiite whe i = 0, E[ ] is also ifiite ad so caot be equal to its true value 1500 i this case the estimator = i is biased. Ubiased estimates of populatio mea ad variace Let X be a radom variable draw from a populatio with mea µ ad variace σ, the E[X] = µ, ad Var[X] = σ. A radom sample, X 1, X, X 3,, X, of size is take from the populatio. The sample mea is X = 1 (X 1 + X + X X ). E[X i ] = µ, ad Var[X i ] = σ for i = 1,, 3,,. Ubiased estimate of the mea E[X ] = E 1 (X 1 + X + X X ) = 1 (E[X 1] + E[X ] + E[X 3 ] + E[X ]) = 1 (µ +µ +µ + +µ) = µ E[X ] = µ, the true value of the mea E[X ] is a ubiased estimate of the mea of the populatio. S3 JUNE 016 SDB 35

38 Ubiased estimate of the variace of the populatio Prelimiary results (i) Var[X] = E[X ] (E[X]) = E[X ] µ E[X ] = Var[X] + µ = σ + µ I (ii) Var[ X ] = E[X ] (E[X ]) = E[X ] μ E[X ] = Var[ X ] + µ = Var 1 (X 1 + X + X X ) + µ = 1 Var[X 1 + X + X X ] + µ = 1 (Var[X 1] + Var[X ] + Var[X 3 ] + + Var[X ]) + µ = 1 (σ + σ + σ + + σ ) + µ E[X ] = 1 σ + µ II Proof The variace of X 1, X, X 3,, X is defied to be Variace = (s.d.) = 1 X i X E[(s.d.) ] = E 1 X i X = E 1 X i E[X ] = 1 E X i E[X ] = 1 E X i E[X ] = 1 (σ + μ ) E[X ] sice E X i = (σ + μ ) from I = 1 (σ + μ ) 1 σ + µ sice E[X ] = 1 σ + μ from II E[(s.d.) ] = (σ + µ ) 1 σ + µ = 1 σ Thus E[(s.d.) ] is ot equal to the true value, ad so (s.d.) is a biased estimator of σ, but multiplyig both sides by 1, we ca see that 1 (s.d.) is a ubiased estimator of σ. 36 S3 JUNE 016 SDB

39 Bias Example: A large bag cotais couters: 60% have the umber 0, ad 40% have the umber 1. (a) Fid the mea, µ, ad variace, σ. A simple radom sample of size 3 is draw. (b) List all possible samples. (c) Fid the samplig distributio for the mea X = X 1+ X +X 3 3 (d) Use your aswers to part (c) to fid E[X ], ad Var [X ]. (e) Fid the samplig distributio for the mode M. (f ) Use your aswers to part (e) to fid E[M], ad Var [M]. Solutio: (a) µ = x i p i = = 0 4 σ = x i p i µ = ( ) 0 4 = 0 4 (b) Possible samples are (0, 0, 0) (1, 0, 0) (1, 1, 0) (1, 1, 1) (0, 1, 0) (1, 0, 1) (0, 0, 1) (0, 1, 1) (c) From (c) we ca fid the samplig distributio of the mea X p (d) E[X ] = = Var[X ] = ( ) 0 4 Var[X ] = 0 08 (e) From (c) we ca fid the samplig distributio of the mode M 0 1 p (f ) E[M] = = 0 35 Var[M] = ( ) 0 35 Var[M] = Thus the sample mea is a ubiased estimator of the mea of the populatio sice E[X ] = 0 4 = µ, the true value S3 JUNE 016 SDB 37 1

40 but the sample mode is a biased estimator of the mode of the populatio E[M] = 0 35, but the true value of the mode of the populatio is 0. We say that the bias is E[M] (the true value) = = 0 35 I geeral, if λ is a biased estimator of the parameter λ the the bias is defied to be bias = E[λ ] λ I the above example, the bias i estimatig the mode from the sample is bias = E[M] true value = = S3 JUNE 016 SDB

41 Probability geeratig fuctios Probability fuctios are a eat idea, ad are useful for fidig the expected mea ad variace for distributios which have a probability geeratig fuctio which is easy to differetiate. If X is a radom variable o the set [1, ], the G(t) = p 0 + p 1 t + p t + + p t is a probability geeratig fuctio, p.g.f., if (i) p i = 1, ad (ii) p i 0 i 1 P(X = i) = the coefficiet of t i. The probability geeratig fuctio ca be thought of as a probability labellig fuctio, where t i acts as a label for the probability that X = i. Expected mea ad variace for a p.g.f. We kow that E[X] = x i p i = 0 p p 1 + p + + p = ip i ad that Var[X] = E[X ] (E[X]) = 0 p p 1 + p + + p ( ip i ) Notice that G (t) = 0 p p 1 + p t + 3 p 3 t + + p t 1 G (1) = 0 p p 1 + p + 3 p p Expected mea = E[X] = G (1) ad G (t) = 0 ( 1) p p p + 3 p 3 t + + ( 1) p t G"(1) = i(i 1)p i = i p i ip i i p i = G"(1) + ip i 1 1 Var[X] = i p i ip i 1 1 Var[X] = G (1) + G (1) (G (1)) Thus for a probability geeratig fuctio G(t) = p 0 + p 1 t + p t + + p t, E[X] = G (1) ad Var[X] = G (1) + G (1) (G (1)). S3 JUNE 016 SDB 39

42 Mea ad variace of a Biomial distributio If X B(, p) the P(X = i) = C i p i q i, where p + q = 1. These probabilities are the coefficiets of t i i the expasio of (q + pt) the p.g.f. for the biomial distributio B(, p) is G(t) = (q + pt). G (t) = p(q + pt) 1, ad G (t) = ( 1)p (q + pt) µ = E[X] = G (1) = p sice p + q = 1 ad σ = Var[X] = G (1) + G (1) (G (1)) = ( 1)p + p (p) = p p + p p σ = Var[X] = p(1 p) or pq. Mea ad variace of a Poisso distributio If X P O (λ) the, i a give iterval, P(X = i) = λi e λ, where λ is the mea umber of occurreces i! i a iterval of the same legth, i = 0, 1,, 3, G(t) = λi e λ t i = e λ λi t i = e λ e λt i! i! i=0 G (t) = λe λ e λt ad G (t) = λ e λ e λt µ = E[X] = G (1) = λ e λ e λ = λ µ = E[X] = λ i=0 ad σ = Var[X] = G (1) + G (1) (G (1)) = λ + λ λ sice e λ e λ = 1 σ = Var[X] = λ 40 S3 JUNE 016 SDB

43 Idex χ test biomial dist, 4 cotiuous uiform dist., 4 degrees of freedom, 3 discrete uiform dist., 3 geeral poits, 3 ormal dist, 6 Poisso dist., 4 Bias, 9, 37 Biased estimators, 8 bias, 38 examples, 35 Biomial distributio p.g.f. expected mea ad variace, 40 Cesus, 4 Cetral limit theorem, 15 Combiatios of radom variables expected mea of X ± Y, 3 expected variace of X ± Y, 3 idepedet ormal variables, 3 Var[X+Y], 34 E[X + Y], 33 Cofidece itervals, 16 Cotigecy tables χ test, 7 degrees of freedom, 7 Data primary data, 7 secodary data, 7 Estimators populatio mea, 11 populatio variace, 11 Lottery samplig, 5 PMCC compariso with Spearma, 3 Poisso distributio p.g.f. expected mea ad variace, 40 Probability geeratig fuctios, 39 expected mea ad variace, 39 Radom umber tables, 5 Raks equal raks, 9 Sample variace estimator of populatio variace, 11 Samplig, 4 quota samplig, 7 sample meas, 14 simple radom samplig, 5 stratified samplig, 6 systematic samplig, 5 with ad without replacemet, 6 Sigificace test zero correlatio, 31 Sigificace test variace of populatio kow differece betwee meas, 19 mea of ormal distributio, 18 Sigificace test variace of populatio NOT kow differece betwee meas, mea of ormal distributio, 1 Spearma compariso with PMCC, 3 Spearma s rak correlatio coefficiet, 9 whe to use, 30 Stadard error, 15 Ubiased estimators, 8 examples, 34 of the populatio mea, 35 of the populatio variace, 36 variace of (X + Y), 33 whe to use, 30 S3 JUNE 016 SDB 41

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals