Statistics 3. Revision Notes

Size: px
Start display at page:

Download "Statistics 3. Revision Notes"

Transcription

1 Statistics 3 Revisio Notes Jue 016

2 S3 JUNE 016 SDB

3 Statistics 3 1 Combiatios of radom variables... 3 Expected mea ad variace for X ± Y... 3 Remider... 3 Combiig idepedet ormal radom variables Y... 3 Samplig... 4 Methods of collectig data... 4 Takig a cesus... 4 Samplig... 4 Simple radom samplig... 5 Usig radom umber tables... 5 Systematic samplig... 5 Stratified samplig... 6 Samplig with ad without replacemet... 6 Quota samplig... 7 Primary data... 7 Secodary data Biased & ubiased estimators... 8 Ubiased estimators of µ ad σ Estimatig µ ad σ from a sample Cofidece itervals ad sigificace tests Samplig distributio of the mea Cetral limit theorem ad stadard error Cofidece itervals Cetral Limit Theorem Example Sigificace testig variace of populatio kow Mea of ormal distributio Differece betwee meas of ormal distributios Sigificace testig variace of populatio NOT kow, large sample... 1 Mea of ormal distributio... 1 Importat assumptio... 1 Differece betwee meas...

4 5 Goodess of fit, χ test... 3 Geeral poits... 3 Discrete uiform distributio... 3 Cotiuous uiform distributio... 4 Biomial distributio... 4 Poisso distributio... 4 The ormal distributio... 6 Cotigecy tables Regressio ad correlatio... 9 Spearma s rak correlatio coefficiet... 9 Rakig ad equal raks...9 Spearma s rak correlatio coefficiet...9 Spearma or PMCC...30 Testig for zero correlatio Product momet correlatio coefficiet...31 Spearma s rak correlatio coefficiet...31 Compariso betwee PMCC ad Spearma Appedix Combiig radom variables E[X + Y] = E[X] + E[Y]...33 Var[X + Y] = Var[X] + Var[Y]...34 Ubiased & biased estimators Ubiased estimators...34 Biased Estimators...35 Ubiased estimates of populatio mea ad variace Ubiased estimate of the mea...35 Ubiased estimate of the variace of the populatio...36 Bias...37 Probability geeratig fuctios Expected mea ad variace for a p.g.f Mea ad variace of a Biomial distributio...40 Mea ad variace of a Poisso distributio...40 Idex S3 JUNE 016 SDB

5 1 Combiatios of radom variables Expected mea ad variace for X ± Y Remider For ay two radom variables X ad Y E[aX] = ae[x] ad Var[aX] = a Var[X] E[X + Y] = E[X] + E[Y] ad E[X Y] = E[X] E[Y] ad for two idepedet radom variables Var[X + Y] = Var[X] + Var[Y] ad Var[X Y] = Var[X] + Var[Y]. Combiig idepedet ormal radom variables Y If X 1 ad X are idepedet ormal radom variables X 1 ~ N(µ 1, σ 1 ) ad X ~ N(µ, σ ) the X 1 + X ad X 1 X are also ormal radom variables X 1 + X ~ N(µ 1 + µ, σ 1 + σ ) ad X 1 X ~ N(µ 1 µ, σ 1 + σ ) Example: X 1 ad X are idepedet ormal radom variables X 1 ~ N(1, 1) ad X ~ N(9, 6). Fid the expected mea ad stadard deviatio of X 1 X. Solutio: E[X 1 ] =1, Var[X 1 ] = 1 ad E[X ] = 9, Var[X ] = 6 E[X ] = E[X ] = 9 = 18 ad Var[X ] = Var[X ] = 4 6 = 4 E[X 1 X ] = E[X 1 ] E[X ] = 1 18 = 3 ad Var[X 1 X ] = Var[X 1 ] + Var[X ] = = 36 the expected mea ad stadard deviatio of X 1 X are 3 ad 36 = 6. Aswer S3 JUNE 016 SDB 3

6 Example: The weights of empty coffee jars are ormally distributed with mea 0.1 kg ad stadard deviatio 0.0 kg. The weight of coffee i the jars is ormally distributed with mea 1 kg ad stadard deviatio 0.06kg. Fid the distributio of 1 full jars of coffee. What is the probability that 1 full jars weigh more tha 13 5 kg? Solutio: Let X 1, X,... X 1 be the weights of 1 empty jars ad Y 1, Y,... Y 1 be the weights of coffee i the jars. X ~ N(0 1, 0 0 ) ad Y ~ N(1, 0 06 ). Let W be the total weight of 1 full jars the W = X 1 + X X 1 + Y 1 + Y Y 1. The E[W] = 1 E[X] + 1 E[Y] = = 13 ad, assumig idepedece, Var[W] = 1 Var[X] + 1 Var[Y] = = As we are combiig ormal distributios the distributio for 1 full jars is N(13., 0 048). Aswer The probability that 1 full jars weigh more tha 13 5 kg is Φ = 1 Φ(1 37) = to 3 S.F. Aswer. Samplig Methods of collectig data Takig a cesus A cesus ivolves observig every member of a populatio ad is used if the size of the populatio is small or if extreme accuracy is required. Advatages it should give a completely accurate result, a full picture. Disadvatages very time cosumig ad expesive it caot be used whe testig process destroys article beig tested iformatio is difficult to process because there is so much of it. Samplig Samplig ivolves observig or testig a part of the populatio. It is cheaper but does ot give such a full picture. The size of the sample depeds o the accuracy desired (for a varied populatio a large sample will be required to give a reasoable accuracy). 4 S3 JUNE 016 SDB

7 Simple radom samplig Every member of the populatio must have a equal chace of beig selected. Usig radom umber tables To take a simple radom sample of size from a populatio of N samplig uits first make a list ad give each member of the populatio a umber. The use radom umber tables to select the sample. We igore ay umbers which do ot refer to a member of the populatio for example usig three figure radom umbers for a populatio umbered from 001 to 659 we would igore umbers from 660 to 999. Also we igore the secod occurrece of the same umber. Advatages the umbers are truly radom ad free from bias it is easy to use each member has a kow equal chace of selectio Disadvatages it is ot suitable whe the sample size is large. Lottery samplig A samplig frame is eeded idetifyig each member of the populatio. The ame or umber of each member is writte o a ticket (all the same size, colour ad shape), ad the tickets are all put i a cotaier which is the shake. Tickets are the draw without replacemet. Advatages the tickets are draw at radom. it is easy to use. each ticket has a kow chace of selectio (cosidered as costat as log as the sample size is much smaller tha the total umber of tickets). Disadvatages it is ot suitable for a large sample a samplig frame is eeded. Systematic samplig First make a ordered list, ad divide ito equal groups each of size 50 (or??). Secod select every 50 th (or??) member from the list. I order to make sure that the first o the list is ot automatically selected radom umber tables must be used to select the member i the first group, the select every 50 th (or??) after that. Used whe the populatio is too large for simple radom umber samplig. Advatages simple to use suitable for large samples Disadvatages oly radom if the ordered list is truly radom. it ca itroduce bias S3 JUNE 016 SDB 5

8 Stratified samplig First divide the populatio ito exclusive (distict) groups or strata ad the select a sample so that the proportio of each stratum i the sample equals the proportio of that stratum i the populatio. Example: How would you take a stratified sample of 50 childre from a school of 500 pupils divided as follows: Boys Girls Upper sixth Lower sixth Fifth form Fourth form Third form Solutio: As 50 is 1 / 10 of the total populatio, 1 / 10 of each stratum should be selected i the sample. Thus the sample would comprise Boys Girls Upper sixth 3 4 Lower sixth 3 3 Fifth form 7 6 Fourth form 6 7 Third form 5 6 ad simple radom umber samplig would be used withi each stratum. Used whe the sample is large the populatio divides aturally ito mutually exclusive groups. Advatages it ca give more accurate estimates (or a more represetative picture) tha simple radom umber samplig whe there are clear strata preset. It reflects the populatio structure. Disadvatages withi the strata the problems are the same as for ay simple radom sample if the strata are ot clearly defied they may overlap. Samplig with ad without replacemet Simple radom samplig is samplig without replacemet i which each member of populatio ca be selected at most oce. I samplig with replacemet each member of the populatio ca be selected more tha oce: this is called urestricted radom samplig. 6 S3 JUNE 016 SDB

9 Quota samplig This is a o-radom method. First decide o groups ito which the populatio is divided ad a umber from each group to be iterviewed to form quotas. The go out ad iterview ad eter each result ito the relevat quota. If someoe refuses to aswer or belogs to a quota which is already full the igore that persos reply ad cotiue iterviewig util all quotas are full. Used whe it is ot possible to use radom methods - for example whe the whole populatio is ot kow (homeless i a big city). Advatages ca be doe quickly as a represetative sample ca be obtaied with a small sample size costs are kept to a miimum admiistratio is fairly easy. Disadvatages it is ot possible to estimate the samplig errors (as it is ot a radom process) iterviewer may ot put ito correct quota o-resposes are ot recorded it ca itroduce iterviewer bias Primary data Primary data is data collected by or o behalf of the perso who is goig to use the data. Advatages collectio method is kow accuracy is kow exact data eeded are collected Disadvatages costly i time ad effort Secodary data Secodary data is data ot collected by or o behalf of the perso who is goig to use it. The data are secod-had e.g. govermet cesus statistics. Advatages cheap to obtai large quatity available (e.g. iteret) much has bee collected year o year ad ca be used to plot treds Disadvatages collectio method may ot be kow accuracy may ot be kow it ca be i a form which is difficult to hadle bias is ot always recogised. S3 JUNE 016 SDB 7

10 3 Biased & ubiased estimators Example: A bag cotais a large umber of cois, of which are p cois ad 3 are 5p cois. 5 5 (a) X is the value of a sigle coi draw from the bag. Fid the expected mea of all cois i the bag, µ = E[X]. Samples of size 3 are ow draw from the bag. (b) Fid the samplig distributio of ad the expected value of (i) the media, ad (ii) the mea. (c) (i) The media, Q, is used as a estimator of the mea of all the cois, µ. Show that Q is a biased estimator of µ, ad fid the bias. (ii) The mea, X, is used as a estimator of the mea of all the cois, µ. Show that X is a ubiased estimator of µ. (d) kq is ow used as a ubiased estimator of the mea of all the cois. Fid the value of k. Solutio: (a) µ = E[X] = x i p i = = 3 8 (b) Sample Probability media mea (,,) = 15 (,,5), (,5,), (5,,) = (,5,5), (5,,5), (5,5,) = (5,5,5) = Samplig distributio (i) Media (ii) Mea Q = x i p i x i p i X = x i p i x i p i E[Q ] = = E[X] = = S3 JUNE 016 SDB

11 Ubiased estimator: If X (usually foud from a sample) is used to estimate the value of a populatio parameter, t, the X is a ubiased estimator of t if E[X] = the true value of the parameter t. Bias: (c) If a estimator, X, is biased, the the bias is the differece betwee E[X] ad the true value of the parameter t. (i) The media Q is used as the estimate of the mea. From part (a) we kow that the true value of the mea µ is 3 8, ad i part (b) we have show that E[Q ] = Q is a biased estimator of µ, ad the bias is E[Q ] true value of µ = = (ii) The mea X is used as the estimate of the mea. From part (a) we kow that the true value of the mea µ is 3 8, ad i part (b) we have show that E[X] = 3 8 E[X] = the true value of the mea X is a ubiased estimator of µ, (d) If we ow use kq as a ubiased estimator of the mea value of all the cois. E[kQ ] = k E[Q ] = k But the true mea µ = If kq is a ubiased estimator of µ, E[kQ ] = true value of µ k = k = Example: A sample of size 3 is draw from a biomial distributio B(10, 0 5) ad the mea, X, is calculated. The probability of success, p, is estimated by p = 1 X. Show that p is a ubiased estimator of p. 10 Solutio: E[X] = p = = 5 For a sample {X 1, X, X 3 }, X = 1 (X X + X 3 ) E X = E 1 (X X + X 3 ) = 1 (E[X 3 1] + E[X ] + E[X 3 ]) E X = = 5 sice E[X 3 i] = E[X] = 5, for i = 1,, 3 E[p ] = E 1 10 X = 1 10 E X = = 0 5, which is the true value of p p = 1 X is a ubiased estimator of p. 10 S3 JUNE 016 SDB 9

12 Example: A sample of size 4 is draw from a cotiuous uiform distributio, U[3, β ]. The mea of the sample, X, is calculated. The upper limit, β, is estimated by β = X 3. Show that β is a ubiased estimator of β. Solutio: E[X] = 1 (3 + β ) For a sample {X 1, X, X 3, X 4 }, X = 1 (X X + X 3 + X 4 ) E X = E 1 (X X + X 3 + X 4 ) = 1 (E[X 4 1] + E[X ] + E[X 3 ] + E[X 4 ]) E X = 1 4 E[X] = E[X] sice E[X 4 i] = E[X], for i = 1,, 3, 4 E X = 1 (3 + β ) E β = E X 3 = E X 3 = 1 (3 + β ) 3 = β, E β = β which is the true value of the (ukow) upper limit, β. β = X 3 is a ubiased estimator of β. There is more o biased ad ubiased estimators i the Appedix. 10 S3 JUNE 016 SDB

13 Ubiased estimators of µ ad σ Estimatig µ ad σ from a sample We usually do ot kow the mea, µ, ad the variace, σ, of a populatio. To estimate these values we take a sample {X 1, X, X 3,, X } of size ad calculate the sample mea, X = 1 X i, ad (sd) x = 1 X i X = 1 (X X ) these ca be compared with the formulae for populatio variace from the S1 module. It ca be show that E X = the true value of µ X = μ is a ubiased estimator of the populatio mea µ. It ca be show that E (sd) x = 1 σ. I (sd) x is a biased estimator of σ. I E 1 (sd) x = 1 1 σ = σ, the true value of the variace 1 (sd) x = σ is a ubiased estimator of the populatio variace, σ. (Proofs of these results are give i the Appedix.) Note: the Edexcel course uses both the letters S ad s x to mea the ubiased estimate of σ. Also, the term Sample Variace is used to deote the ubiased estimate of σ, the variace of the populatio. I these otes I shall always thik of the variace, (sd) x, as 1 X i X = 1 (X X ) To fid S or s x, the ubiased estimator for σ : Calculate (sd) x, ad the multiply by 1 S3 JUNE 016 SDB 11

14 Example: The weights of a sample of five chocolate bars produced by a machie were 56, 53, 57, 51 ad 54 grams. Fid ubiased estimators for the weight of all chocolate bars produced by that machie. Solutio: X X X (X X ) X = 1 71 X = = 54 5 (sd) x = 1 (X X ) = 8 = 4 56 σ = 1 (sd) x = = Aswer Ubiased estimators for the mea ad variace of all chocolate bars are 54 grams ad 5 7 grams. Example: The volume of water i each of a sample of 14 litre bottles of water from a day s productio is take. The results are show below, i ml. 103, 1019, 1004, 1011, 103, 1014, 1017, 100, 100, 1010, 105, 1007, 1016, 1019 Fid ubiased estimates for the mea ad variace of all bottles produced o that day. Solutio: First fid the sample mea, X, = = (fidig X X each time) would give upleasat arithmetic, so use (sd) x = 1 X X X = (sd) x = S = s x = 1 (sd) x = = = Aswer Ubiased estimators for the mea ad variace of the whole day s productio are ml ad ml. 1 S3 JUNE 016 SDB

15 Example: The weights of a sample of 15 packets of biscuits are recorded ad give the followig results. Σ X = 3797 grams, ad Σ X = Fid ubiased estimators for the mea ad variace of all biscuits produced by this process. Solutio: µ = X = = = 5 1 grams. (sd) x = 1 X X = = σ = = = grams. Aswer Ubiased estimators are μ =5 1 g, ad σ = g. Example: The legths of 10 rods are measured, ad the sample has mea, X = 6 7 cm ad variace s = 76 9 cm. A eleveth rod has legth 30 cm. Fid (a) the mea ad (b) the variace of the sample of 11 rods. Solutio: (a) With the sample mea there are o complicatios. 10 For = 10, X 10 = 1 10 X i = 6 7 X i = 67 For = 11, 11 i=1 10 i=1 X i = = 97 X 11 = 1 11 X i i=1 11 i=1 = = 7 cm (b) WARNING: The questio refers to the variace of the sample, which meas the ubiased estimate of the variace of the populatio. 10 s 10 = 76 9 = (sd) (10 1) 10 (sd) 9 10 = 76 9 = (sd) 10 = X i X 10 = i=1 X i = = 781 i=1 11 with extra rod X i = = 871 i=1 (sd) 11 = X i s 11 = i=1 871 X 11 = 11 7 = (11 1) (sd) 11 = = 70 cm For 11 rods, sample mea is 7 cm, ad sample variace is 70 cm. S3 JUNE 016 SDB 13

16 4 Cofidece itervals ad sigificace tests Samplig distributio of the mea X is a radom variable draw from a populatio with mea µ ad stadard deviatio σ. If {X 1, X,..., X } is a radom sample of size with mea X = the E[X i ] = µ, ad Var[X i ] = σ, for i = 1,, 3,, X 1 + X X ad the expected mea of the populatio of sample meas is E X = X 1 + X X E 1 = ( E[ X ] + E[ X ] +... E[ X ]) = ( µ + µ... µ ) = µ Also the expected variace of the populatio of sample meas is Var X = X 1 + X X Var 1 1 = ( Var[ X ] + Var[ X ] +... Var[ X ]) = ( σ + σ σ ) 1... assumig that all the X i are idepedet = σ = σ This meas that if very may samples were take ad the mea of each sample calculated the the σ mea of these meas would be µ ad the variace of these meas would be. It ca also be show that the sample meas form a Normal distributio (provided that is large eough ). We ca the say that for samples draw from a populatio with mea µ ad variace σ, the samplig distributio of the mea is N(µ, σ ). 14 S3 JUNE 016 SDB

17 Cetral limit theorem ad stadard error The cetral limit theorem states that If {X 1, X,..., X } is a radom sample of size draw from ay populatio with mea µ ad variace σ the the populatio of sample meas (i) has expected mea µ (ii) (iii) σ has expected variace forms a ormal distributio if is large eough. σ i.e. X ~ N µ,. The cetral limit theorem is used for samplig whe the sample size is large (> about 50) as the populatio of sample meas is the approximately ormal whatever the distributio of the origial populatio. σ The stadard error of the sample mea is. Example: A sample of size 50 is take from a populatio of eggs with mea 3 4 grams ad variace 36 grams. (i) Solutio: Fid the probability that a sigle egg weighs more tha 5 grams. (ii) Fid the probability that the sample mea is larger tha 5. (iii) What assumptios did you make? (i) The weight of a sigle egg, X N(3 4, 6 ) P(X > 5) = Φ = Φ(0 7) = (ii) µ = 3 4, σ = 36 The sample mea X N 3 4, 6 50 stadard error is X ~ N(3 4, ) P(X > 5) = 1 Φ σ = 6 = = 1 Φ(1 89) = (from Normal tables) = (iii) We have assumed the Cetral Limit Theorem: i particular that the sample meas form a ormal distributio. S3 JUNE 016 SDB 15

18 Cofidece itervals Cetral Limit Theorem Example Example: Solutio: A biscuit maufacturer makes packets of biscuits with a omial weight of 50 grams. It is kow that over a log period the variace of the weights of the packets of biscuits produced is 5 grams. A sample of 10 packets is take ad foud to have a mea weight of 53 4 grams. Fid 95% cofidece limits for the mea weight of all packets produced by the machie. First assume that the machie is still producig packets with the same variace, 5. Suppose that the mea weight of all packets of biscuits is µ grams the the populatio of all packets has mea µ ad stadard deviatio 5. From the cetral limit theorem we ca assume that the sample meas form a approximately σ 5 ormal populatio with mea µ ad stadard error (stadard deviatio) = = % of the samples will have a mea i the regio 1 96 < Z < 1 96 f(x) 95% We assume that the mea of this sample, 53 4, lies i this regio x < 53 4 μ < 53 4 μ < ad 53 4 μ < µ < 53 4 ad 53 4 < µ µ < ad < µ < µ < < µ < 56 5 This meas that 95% of the samples will give a iterval which cotais the mea ad we say that [50 3 g, 56 5 g] is a 95% cofidece iterval for µ. This meas that there is a 0 95 probability that this iterval cotais the true mea. It does ot mea that there is a probability of 0 95 that the true mea lies i this iterval - the true mea is a fixed umber, ad either does or does ot lie i the iterval so the probability that the true mea lies i the iterval is either 1 or S3 JUNE 016 SDB

19 I practice we go straight to the last lie of the example: 95% cofidece limits are µ ± σ sice P(Z < z < ) = 0 95 tables give P(Z > ) = % cofidece limits are µ ± σ sice P(Z < z < ) = 0 90 tables give P(Z > ) = 0 05 Other cofidece limits ca be foud usig the Normal Distributio tables. Example: A sample of 64 packets of corflakes has a mea weight X = 510 grams ad a variace S = 36 grams. Fid 90% cofidece limits for the mea weight of all packets. (Note that the sample variace is take as the ubiased estimate of σ.) Solutio: We assume that the sample variace = the variace of the populatio of all packets S = 36 = σ. Now fid stadard deviatio (stadard error) of the samplig distributio of the mea (populatio σ 6 of sample meas), stadard error = = = For 90% cofidece limits z = ± usig the sample mea X = 510 grams (remember to use the 4 D.P. tables after the Normal Dist. tables), 90% cofidece limits are 510 ± = 510 ± 1 34 a 90% cofidece iterval is [508 8, 511 ] to 4 S.F. Note that we have assumed that the ubiased estimate, S (=36), is the actual variace, σ, of the populatio. This is a reasoable assumptio as the umber i the sample, 64, is large ad the error itroduced is therefore small. S3 JUNE 016 SDB 17

20 Sigificace testig variace of populatio kow Mea of ormal distributio Example: Solutio: A machie, whe correctly set, is kow to produce ball bearigs with a mea weight of 84 grams with a stadard deviatio of 5 grams. The productio maager decides to test whether the machie is workig correctly ad takes a sample of 10 ball bearigs. The sample has mea weight 83. grams. Would you advise the productio maager to alter the settig of his machie? Use a 5% sigificace level. 1) H 0 : µ = 84 grams ) H 1 : µ 84 grams tail test (Note that the machie is ot workig correctly if the test result is too high or too low) 3) 5% Sigificace level 4) The Test We assume that the machie is still workig with a stadard deviatio of σ = 5 g. From H 0, the mea weight of all ball bearigs is assumed to be µ = 84 g. These are the parameters for the populatio of all ball bearigs. We wat to test a sample mea ad therefore eed the mea ad stadard deviatio of the populatio of sample meas (the samplig distributio of the sample mea, X ). Expected mea of the sample meas = µ = 84 g. ad expected stadard deviatio of the sample meas = stadard error =. We have a observed mea of 83 For a two-tailed test at 5%, we take 5% at each ed P(X < 83 ) = ( ) Φ =Φ ( 1 757) = (1 Φ(1 75)) = = 4 01% > 5% ad so ot sigificat at the 5% level. 83 σ 5 = = x 5) Coclusio Do ot reject H 0 at the 5% level ad advise the productio maager that there is evidece that he should ot chage his settig, or that there is evidece that the machie is workig correctly, etc. 18 S3 JUNE 016 SDB

21 Differece betwee meas of ormal distributios Suppose that X ad Y are two idepedet radom variables from differet ormal distributios X ~ N(µ x, σ x ) ad Y ~ N(µ y, σ y ). If samples of sizes x ad y are draw from these populatios the the distributios of the sample meas, X ad Y will be ormal X ~ N μ x, σ x x ad Y ~ N μ y, E[X Y ] = E[X ] E[Y ] ad Var[X Y ] = Var[X ] + Var[Y ] the differeces of the sample meas, X Y, will be ormal σ y y σ x (X Y ) ~ N μ x μ y, + σ y x y Example: The weights of chocolate bars produced by two machies, A ad B, are kow to be ormally distributed with variaces σ A = 4 ad σ B = 3 grams. Samples are take from each machie of sizes A = 5 ad B = 16 which have meas X A = 13 1 ad X B = 14 4 grams. Is there ay evidece at the 5% sigificace level that the bars produced by machie B are heavier tha the bars produced by machie A? Solutio: Suppose that the mea weights for all bars from the two machies are µ A ad µ B H 0 : µ A = µ B H 1 : µ B > µ A oe-tail test at 5% level The test statistic is the observed differece betwee sample meas, X B X A = = 1 3, ad we must fid the variace of this populatio of differeces of sample meas (the samplig distributio of differeces of sample meas). Cosider the populatio of differeces of sample meas Firstly, for the populatio of sample meas for machie B σ B 3 expected variace Var[ X B ] = = 16 B X B X A. S3 JUNE 016 SDB 19

22 ad secodly, for the populatio of sample meas for machie A σ A 4 expected variace Var[ X A ] = = 5 A ad so for the populatio of differeces of sample meas expected mea = E[ X B X A ] = µ A - µ B = 0 (from H 0 ) ad Var[ X B X A ] = Var[ X B ] + Var[ X A ] σ B + σ A B = = 3 / / 5 = A The observed differece, the test statistic, is = 1 3 ad the stadard error is ) The Cetral Limit Theorem tells us that we have a Normal distributio P(differece > 1 3) = Φ = 1 Φ( 053) ) = 1 Φ( 0) = = = 1 34% < 5% sigificat at 5% level so reject H 0 ad coclude that there is evidece that machie B is producig bars of chocolate with a heavier mea weight tha machie A. Fortuately (!) the formula for testig the differece betwee sample meas Z = X Y (μ x μ y ) σ x x + σ y y is i your formula booklet. 0 S3 JUNE 016 SDB

23 Sigificace testig variace of populatio NOT kow, large sample Whe the variace of the populatio, σ, is ot kow ad whe the sample is large, we assume that the variace of the sample (meaig the ubiased estimate of σ ), S, is the variace of the populatio, σ. As the sample is large, the error itroduced is small. Mea of ormal distributio Example: A machie usually produces steel rods with a mea legth of 5 4 cm. The productio maager wats to test 80 rods to see whether the machie is workig correctly. The sample has mea 5 31 cm ad variace 0 33 cm. Advise the productio maager, usig a 5% level of sigificace. Importat assumptio The sample variace, S, is take as, σ, the ubiased estimate of the variace of the populatio, σ, ad we the assume that the populatio variace equals the ubiased estimate. Solutio: H 0 : µ = 5 4. H 1 : µ 5 4 two-tail test, 5% i each tail We assume that populatio variace σ = the sample variace S = 0 33 σ = 0 33 For the populatio of sample meas (the samplig distributio of the sample meas) expected mea = 5 4 from hypothesis σ 0 33 ad stadard error = = = The observed sample mea is 5.31 ad for a two-tail test at 5% we cosider Φ = Φ( 4393) = 1 Φ( 44) = < 5% reject H 0 ad coclude that there is evidece that that the machie is ot producig rods of mea legth 5 4 cm. S3 JUNE 016 SDB 1

24 Differece betwee meas Example: Solutio: A firm has two machies, A ad B, which make steel cable. 40 cables produced by machie A have a mea breakig strai of 178 N ad variace of 75 N, whereas 65 cables produced by machie B have a mea breakig strai of 1757 N ad a variace of 63 N. Is there ay evidece, at the 10% level, to suggest that machie B is producig stroger cables tha machie A? Let μ A ad μ B be the mea breakig stregths of all cables produced by machies A ad B. 1) H 0 : μ A = μ B ) H 1 : μ B > μ A 1 tail test 3) Sigificace Level 10%. 4) The Test For Machie A We assume that the populatio variace, σ A = the sample variace, S = 75 variace of sample meas Var[ X A ] = σ 75 = = A 40 For Machie B We assume that the populatio variace, σ B = the sample variace, S = 63 variace of sample meas Var[ X B ] = For differeces i sample meas Expected mea = 0 Expected variace is Var[ X B X A σ 63 = = B 65 from hypothesis X B X A ] = Var[ X B ] + Var[ X A ] = = stadard deviatio or stadard error = = We have a observed differece i meas, test statistic, X B X A = = 9 ad for a 1-tail test that B is stroger we eed the area to the right of 9 mea is treated as cotiuous, so do ot use 8.5 = 1 Φ 9 0 = 1 Φ( 04) = < 10% which is sigificat at 10%. A B 5) Coclusio Reject H 0 at the 10% level ad coclude that there is evidece that machie B produces cables with a greater mea stregth tha machie A. S3 JUNE 016 SDB

25 5 Goodess of fit, χ test Geeral poits The χ test ca oly be used to test two lists of frequecies the observed ad the expected frequecies calculated from the hypothesis. The expected frequecies do ot eed to be itegers (give D.P.) χ ( O = i Ei ) Ei, where O i ad E i are the observed ad expected frequecies. If the expected frequecy for a class is less tha 5, the you must group this class with the ext class (or two ). The umber of degrees of freedom, ν, is the umber of cells (after groupig if ecessary) mius the umber of liear equatios coectig the frequecies. Discrete uiform distributio Example: A die is rolled 300 times ad the frequecy of each score recorded. Score: Frequecy: Test whether the die is fair at the 5% level of sigificace. Solutio: H 0 : The die is fair, the probability of each score is 1 / 6. H 1 : The die is ot fair, the probability of each score is ot 1 / 6. The expected frequecies are all 1 / = 50 ad we have Score Observed frequecy Expected frequecy ( O E ) i E i i Totals χ = 3 04 ad ν = umber of degrees of freedom = 1 = 6 1 = 5 sice the total is a liear equatio coectig the frequecies ad is fixed. From tables we see that χ5 ( 5%) = 1 83 > 3 04, so our observed result is ot sigificat. We do ot reject H 0 ad coclude that the die is fair. S3 JUNE 016 SDB 3

26 Cotiuous uiform distributio This is very similar to the discrete uiform distributio pay attetio to the class boudaries ad fid the expected frequecies. Biomial distributio For H 0 The Biomial distributio is a good fit we use the mea of the Observed frequecies to calculate the Expected frequecies, ad so both O i ad E i give the same mea ad total: thus there are liear equatios coectig the frequecies ad ν = but For H 0 The Biomial distributio, B(30, 0 3), is a good fit the meas usig Oi ad E i will be differet: thus there is oly 1 liear equatio, the total, coectig the frequecies ad so ν = 1. Poisso distributio For H 0 The Poisso distributio is a good fit we use the mea of the Observed frequecies to calculate the Expected frequecies, ad so both O i ad E i give the same mea ad total: thus there are liear equatios coectig the frequecies ad ν = but For H 0 The Poisso distributio, P o (3), is a good fit the meas usig Oi ad E i will be differet: thus there is oly 1 liear equatio, the total, coectig the frequecies ad so ν = 1. Example: A switchboard operator records the umber of ew calls i 69 cosecutive oe-miute periods i the table below. umber of calls frequecy a) Say why you thik that a Poisso distributio might be suitable. b) Fid the mea ad variace of this distributio. Do these figures support the view that they might form a Poisso distributio? c) Test the goodess of fit of a Poisso distributio at the 5% level. Solutio: a) Telephoe calls are likely to occur sigly, radomly, idepedetly ad uiformly which are the coditios for a Poisso distributio. b) Treatig 6 as 7 we calculate the mea ad variace x f xf x f S3 JUNE 016 SDB

27 mea = 15 / 69 = 3 1 ad variace = 915 / 69 ( 15 / 69 ) = From these figures we ca see that the mea ad variace are approximately equal: sice the mea ad variace of a Poisso distributio are equal this cofirms the view that the distributio could be Poisso. c) H 0 : The Poisso distributio is a suitable model H 1 : The Poisso distributio is ot a suitable model. The Poisso probabilities ca be calculated from P(r) = expected frequecies by multiplyig by N = 69. r λ λ e where λ = 3 1, ad the r! Note that the probability for 6 is foud by addig the other probabilities ad subtractig from 1. x O p E O (grouped) E (grouped) ( O E) E The expected frequecy for x = 0 is 3.06 < 5 so it has bee grouped with x = 1. Thus we have = 6 classes (after groupig) ad ν = = 4 ad χ (5%) = We have calculated χ = 1.9 < which is ot sigificat so we do ot reject H 0 ad coclude that the Poisso distributio is a suitable model. S3 JUNE 016 SDB 5

28 The ormal distributio For H 0 The Normal distributio is a good fit we use the mea ad variace of the Observed frequecies to calculate the Expected frequecies, ad so both O i ad E i give the same mea, variace ad total: thus there are 3 liear equatios coectig the frequecies ad ν = 3 but For H 0 The Normal distributio, N(14, 3 ), is a good fit the meas ad variaces usig Oi ad E i will be differet: thus there is oly 1 liear equatio, the total, coectig the frequecies ad so ν = 1. Example: The sizes of me s shoes purchased from a shoe shop i oe week are recorded below. size of shoe umber of pairs Is the maager s assumptio that the ormal distributio is a suitable model justified at the 5% level? Solutio: H 0 : The ormal distributio is a suitable model H 1 : The ormal distributio is ot a suitable model. The total umber of pairs, mea ad stadard deviatio are calculated to be 175, ad (takig 6 as 5 ad 1 as 1) Rememberig that size 8 meas from 7.5 to 8.5 we eed to fid the area betwee 7.5 ad 8.5 ad multiply by 175 to fid the expected frequecy for size 8, ad similarly for other sizes. x x m Φ(z) class area = p E = 175p O z = s ( O E) E < to = to = to = to = to = > = = 7 classes & 3 liear equatios coectig the frequecies (N, m,s) ν = 3 = 4. χ 4 (5%) = ad we have calculated χ = 5.34 < ad so we do ot reject H 0 ad therefore coclude that the ormal distributio is a suitable model. S3 JUNE 016 SDB

29 Cotigecy tables For a 5 4 table i which the totals of each row ad colum are fixed the? cells represet the degrees of freedom sice if we kow the values of the?s the frequecies i the other cells ca ow be calculated A B C D E totals W???? X???? Y???? Z totals Thus there are (5 1) (4 1) = 1. Geeralisig we ca see that for a m table the umber of degrees of freedom is (m 1)( 1). Example: Natives of Eglad, Africa ad Chia were classified accordig to blood group givig the followig table. O A B AB Eglish Africa Chiese Is there ay evidece at the 5% level that there is a coectio betwee blood group ad atioality? Solutio: H 0 : There is o coectio betwee blood group ad atioality. H 1 : There is a coectio betwee blood group ad atioality. First redraw the table showig totals of each row ad colum O A B AB totals Eglish Africa Chiese totals S3 JUNE 016 SDB 7

30 Now we eed to calculate the expected frequecy for Eglish ad group O. There are 609 Eglish ad 1335 people altogether so 609 / 1335 of the people are Eglish, ad from H 0 we kow that there is o coectio betwee blood group ad atioality, so there should be 609 / 1335 of those with group O who are also Eglish expected frequecy for Eglish ad group O is 544 = = this ca become automatic if you otice that you just multiply the totals for the row ad colum cocered ad divide by the total umber Eglish = 48. O A B AB totals = = = Africa = = = = Chiese = = = = totals The value of χ is calculated below 8 Observed Expected ( O E) frequecy frequecy E We have ν = (4 1)(3 1) = 6 degrees of freedom ad χ (5%) = We have calculated χ = 8.41 < 1.59 do ot reject H 0 ad therefore coclude that there is o coectio betwee atioality ad blood group. S3 JUNE 016 SDB

31 6 Regressio ad correlatio Spearma s rak correlatio coefficiet Rakig ad equal raks Rakig is puttig a list of figures i order ad givig each oe its positio or rak. Equal umbers are give the average of the raks they would have had if all had bee differet. Example: Rak the followig umbers: 45, 65, 76, 56, 34, 45, 3, 67, 65, 45, 81, 3. Solutio: First put i order ad give raks as if all were differet: the give the average rak for those which are equal. Numbers: Actual rak 1 3 4= 4= 6 7= 7= 7= Rak (if all differet) average for equal raks = 4 3 = 8 Modified rak 1 3 4½ 4½ You must ow calculate the PMCC, ot Spearma, usig the modified raks. Spearma s rak correlatio coefficiet To compare two sets of rakigs for the same items, first fid the differece, d, betwee each pair of raks ad the calculate Spearma s rak correlatio coefficiet 6 d r s = 1 ( 1) This is the same as the product momet correlatio coefficiet of the two sets of raks ad so we kow that r s = +1 meas rakigs are i perfect agreemet, r s = 1 meas rakigs are i exact reverse order, r s = 0 meas that there is o correlatio betwee the rakigs. S3 JUNE 016 SDB 9

32 Example: Te varieties of coffee labelled A, B, C,..., J were tasted by a ma ad a woma. Each raked the coffees from best to worst as show. Ma: G H C D A E B J I F Woma: C B H G J D I E F A Fid Spearma s rak correlatio coefficiet. Solutio: Rak for each perso, fid d ad the r s. Coffee Ma Woma d d A B C D E F G H I J d 6 86 r s = 1 = 1 = = 0.51 to 3 S.F. ( 1) Spearma or PMCC Use of Spearma s rak correlatio coefficiet (i) Use whe oe, or both, sets of data are ot from a ormal populatio. (ii) Use whe the data does ot have to be measured o scales or i uits (probably ot ormal). (iii) Use whe data is subjective e.g. judgemets i order of preferece (ot ormal). (iv) Ca be used if the scatter graph idicates a o-liear relatioship betwee the variables, sice the PMCC is used to idicate liear correlatio. (v) Do ot use for tied raks (Spearma formula depeds o o-tied raks). Use of Product momet correlatio coefficiet (i) (ii) (iii) Use whe raks are tied see above: modify the raks ad the use PMCC o the modified raks. Use whe both sets of figures are ormally distributed (this will ot be the case whe usig raks). Use whe the scatter diagram idicates a liear relatioship betwee the variables i.e. whe the poits lie close to a straight lie. 30 S3 JUNE 016 SDB

33 Testig for zero correlatio N.B. the tables give figures for a ONE-TAIL test Product momet correlatio coefficiet PMCC tests to see if there is a liear coectio betwee the variables. For strog correlatio, the poits o a scatter graph will lie close to a straight lie. Remider: PMCC = ρ = where S xx = x i S S xx xy S yy ( xi ), S yy = y i ( yi ), ( xi )( yi ) S xy = xi yi. Example: The product momet correlatio coefficiet betwee 40 pairs of values is Is there ay evidece of correlatio betwee the pairs at the 5% level? Solutio: H 0 : There is o correlatio betwee the pairs, ρ = 0. H 1 : There is correlatio, positive or egative, betwee the pairs, ρ 0, two-tail test From tables for = 40 which give oe-tail figures, we must look at the.5% colum ad the critical values are ±0.310 The calculated figure is 0.5 > ad so is sigificat we reject H 0 ad coclude that there is some correlatio (positive or egative) betwee the pairs. Spearma s rak correlatio coefficiet Spearma tests to see if there is a coectio (or correlatio) betwee the raks. Example: It is believed that a perso who absorbs a drug well o oe occasio will also absorb a drug well o aother occasio. Tests o te patiets to fid the percetage of drug absorbed gave the followig value for Spearma rak correlatio coefficiet, r s = Is there ay evidece at the 5% level of a positive correlatio betwee the two sets of results. Solutio: H 0 : There is o correlatio betwee the two sets of results, ρ s = 0, H 1 : There is positive correlatio betwee the two sets of results, ρ s > 0, oe-tail test. From the tables for = 10 ad a oe-tail test the critical value for 5% is The calculated value is > which is sigificat reject H 0 ; coclude that there is evidece of positive correlatio betwee the two sets of results. Note that this shows correlatio betwee the raks of the two sets of results. S3 JUNE 016 SDB 31

34 Compariso betwee PMCC ad Spearma Example: A radom sample of 8 studets sat examiatios i Geography ad Statistics. The product momet correlatio coefficiet betwee their results was 0 57 ad the Spearma rak correlatio coefficiet was (a) Test both of these values for positive correlatio. Use a 5% level of sigificace. (b) Commet o your results. Solutio: (a) H 0 : ρ = 0 ; H 1 : ρ > 0 For the PMCC the 5% Critical Value is < ot sigificat at %5 there is evidece that there is o positive correlatio. For Spearma s rak correlatio coefficiet the 5% Critical Value is > sigificat at 5% there is evidece of positive correlatio. (b) From the PMCC there is ot eough evidece to coclude that as Statistics marks icreased Geography marks also icreased i.e. coclude that the poits o a scatter diagram do ot lie close to a straight lie. From Spearma s rak correlatio coefficiet there is evidece that studets raked highly i Statistics were also raked highly i Geography, or people with high scores i Statistics also had high scores i Geography 3 S3 JUNE 016 SDB

35 7 Appedix Combiig radom variables Let X ad Y be radom variables with probability distributios X { X 1, X, X 3,, X } with probabilities (p 1, p, p 3,, p ), ad Y { Y 1, Y, Y 3,, Y m } with probabilities (q 1, q, q 3,, q m ), The the radom variable X + Y is all possible combiatios x i + y j as i varies from 1 to ad j varies from 1 to m. Let P(x i + y j ) = r ij. Notice that ad, similarly, m r ij j=1 r ij i=1 = r i1 + r i + r i3 + + r im = p i = q j E[X + Y] = E[X] + E[Y] m E[X + Y] = x i + y j r ij i=1 j=1 = x i r ij i j + y j r ij j i = x i r ij i j + y j r ij j i m = x i p i i=1 + y j q j j=1 E[X + Y] = E[X] + E[Y] S3 JUNE 016 SDB 33

36 Var[X + Y] = Var[X] + Var[Y] I this case we take X ad Y to be idepedet, r ij = P(x i ad y j ) = P(x i ) P(y j ) = p i q j. Also otice that p i = q j = 1 Var[X + Y] = E[(X + Y) ] (E[X + Y]) = x i + y j r ij i j (E[X] + E[Y]) = x i p i q j + x i y j p i q j + y j p i q j ((E[X]) + E[X]E[Y] + (E[Y]) ) i j i j i j = x i p i q j + x i p i y j q j + y j q j p i (E[X]) E[X]E[Y] (E[Y]) i j i j j i = x i p i + E[X]E[Y] + y j q j (E[X]) E[X]E[Y] (E[Y]) i j = E[X ] (E[X]) + E[Y ] (E[Y]) Var[X + Y] = Var[X] + Var[Y], if X ad Y are idepedet. Ubiased & biased estimators Ubiased estimators A estimator λ for a parameter λ is said to be ubiased if E[λ ] = λ. Example: A bag has 468 beads of two colours, white ad gree. 0 beads are take at radom ad the umber, i, of gree beads i the sample is couted. To estimate the true umber of gree beads, g, i the bag, we calculate i g = If g is the true umber of gree beads i the bag the the probability of drawig a gree bead i a sigle trial is p = ad drawig = 0 beads with replacemet gives a Biomial distributio B (, p). Thus µ = E[i] = p = 0 g 468 We do ot actually kow the umber of gree beads, ad wat to estimate this umber after takig oe sample i estimate g = g 468, 34 S3 JUNE 016 SDB

37 We ow fid the expected value of this estimate E[g ] = E i = E[i] = g 468 = g, the true umber the expected value of the estimator, g, is equal to the true value, g the estimator, g, is ubiased. Biased Estimators A estimator λ for a parameter λ is said to be biased if E[λ ] λ. Example A aturalist wishes to estimate the umber of squirrels i a wood. He first catches 50 squirrels, marks them ad the releases them. Later he catches 30 squirrels ad couts the umber, i, which have bee marked. The true umber i the populatio,, is the estimated as from the equatio 50 = i 30 Now E[ ] = 30 0 = 1500 i 1500 i p i. i.e. it is possible that i = 0, i which case the estimate is ifiite whe i = 0, E[ ] is also ifiite ad so caot be equal to its true value 1500 i this case the estimator = i is biased. Ubiased estimates of populatio mea ad variace Let X be a radom variable draw from a populatio with mea µ ad variace σ, the E[X] = µ, ad Var[X] = σ. A radom sample, X 1, X, X 3,, X, of size is take from the populatio. The sample mea is X = 1 (X 1 + X + X X ). E[X i ] = µ, ad Var[X i ] = σ for i = 1,, 3,,. Ubiased estimate of the mea E[X ] = E 1 (X 1 + X + X X ) = 1 (E[X 1] + E[X ] + E[X 3 ] + E[X ]) = 1 (µ +µ +µ + +µ) = µ E[X ] = µ, the true value of the mea E[X ] is a ubiased estimate of the mea of the populatio. S3 JUNE 016 SDB 35

38 Ubiased estimate of the variace of the populatio Prelimiary results (i) Var[X] = E[X ] (E[X]) = E[X ] µ E[X ] = Var[X] + µ = σ + µ I (ii) Var[ X ] = E[X ] (E[X ]) = E[X ] μ E[X ] = Var[ X ] + µ = Var 1 (X 1 + X + X X ) + µ = 1 Var[X 1 + X + X X ] + µ = 1 (Var[X 1] + Var[X ] + Var[X 3 ] + + Var[X ]) + µ = 1 (σ + σ + σ + + σ ) + µ E[X ] = 1 σ + µ II Proof The variace of X 1, X, X 3,, X is defied to be Variace = (s.d.) = 1 X i X E[(s.d.) ] = E 1 X i X = E 1 X i E[X ] = 1 E X i E[X ] = 1 E X i E[X ] = 1 (σ + μ ) E[X ] sice E X i = (σ + μ ) from I = 1 (σ + μ ) 1 σ + µ sice E[X ] = 1 σ + μ from II E[(s.d.) ] = (σ + µ ) 1 σ + µ = 1 σ Thus E[(s.d.) ] is ot equal to the true value, ad so (s.d.) is a biased estimator of σ, but multiplyig both sides by 1, we ca see that 1 (s.d.) is a ubiased estimator of σ. 36 S3 JUNE 016 SDB

39 Bias Example: A large bag cotais couters: 60% have the umber 0, ad 40% have the umber 1. (a) Fid the mea, µ, ad variace, σ. A simple radom sample of size 3 is draw. (b) List all possible samples. (c) Fid the samplig distributio for the mea X = X 1+ X +X 3 3 (d) Use your aswers to part (c) to fid E[X ], ad Var [X ]. (e) Fid the samplig distributio for the mode M. (f ) Use your aswers to part (e) to fid E[M], ad Var [M]. Solutio: (a) µ = x i p i = = 0 4 σ = x i p i µ = ( ) 0 4 = 0 4 (b) Possible samples are (0, 0, 0) (1, 0, 0) (1, 1, 0) (1, 1, 1) (0, 1, 0) (1, 0, 1) (0, 0, 1) (0, 1, 1) (c) From (c) we ca fid the samplig distributio of the mea X p (d) E[X ] = = Var[X ] = ( ) 0 4 Var[X ] = 0 08 (e) From (c) we ca fid the samplig distributio of the mode M 0 1 p (f ) E[M] = = 0 35 Var[M] = ( ) 0 35 Var[M] = Thus the sample mea is a ubiased estimator of the mea of the populatio sice E[X ] = 0 4 = µ, the true value S3 JUNE 016 SDB 37 1

40 but the sample mode is a biased estimator of the mode of the populatio E[M] = 0 35, but the true value of the mode of the populatio is 0. We say that the bias is E[M] (the true value) = = 0 35 I geeral, if λ is a biased estimator of the parameter λ the the bias is defied to be bias = E[λ ] λ I the above example, the bias i estimatig the mode from the sample is bias = E[M] true value = = S3 JUNE 016 SDB

41 Probability geeratig fuctios Probability fuctios are a eat idea, ad are useful for fidig the expected mea ad variace for distributios which have a probability geeratig fuctio which is easy to differetiate. If X is a radom variable o the set [1, ], the G(t) = p 0 + p 1 t + p t + + p t is a probability geeratig fuctio, p.g.f., if (i) p i = 1, ad (ii) p i 0 i 1 P(X = i) = the coefficiet of t i. The probability geeratig fuctio ca be thought of as a probability labellig fuctio, where t i acts as a label for the probability that X = i. Expected mea ad variace for a p.g.f. We kow that E[X] = x i p i = 0 p p 1 + p + + p = ip i ad that Var[X] = E[X ] (E[X]) = 0 p p 1 + p + + p ( ip i ) Notice that G (t) = 0 p p 1 + p t + 3 p 3 t + + p t 1 G (1) = 0 p p 1 + p + 3 p p Expected mea = E[X] = G (1) ad G (t) = 0 ( 1) p p p + 3 p 3 t + + ( 1) p t G"(1) = i(i 1)p i = i p i ip i i p i = G"(1) + ip i 1 1 Var[X] = i p i ip i 1 1 Var[X] = G (1) + G (1) (G (1)) Thus for a probability geeratig fuctio G(t) = p 0 + p 1 t + p t + + p t, E[X] = G (1) ad Var[X] = G (1) + G (1) (G (1)). S3 JUNE 016 SDB 39

42 Mea ad variace of a Biomial distributio If X B(, p) the P(X = i) = C i p i q i, where p + q = 1. These probabilities are the coefficiets of t i i the expasio of (q + pt) the p.g.f. for the biomial distributio B(, p) is G(t) = (q + pt). G (t) = p(q + pt) 1, ad G (t) = ( 1)p (q + pt) µ = E[X] = G (1) = p sice p + q = 1 ad σ = Var[X] = G (1) + G (1) (G (1)) = ( 1)p + p (p) = p p + p p σ = Var[X] = p(1 p) or pq. Mea ad variace of a Poisso distributio If X P O (λ) the, i a give iterval, P(X = i) = λi e λ, where λ is the mea umber of occurreces i! i a iterval of the same legth, i = 0, 1,, 3, G(t) = λi e λ t i = e λ λi t i = e λ e λt i! i! i=0 G (t) = λe λ e λt ad G (t) = λ e λ e λt µ = E[X] = G (1) = λ e λ e λ = λ µ = E[X] = λ i=0 ad σ = Var[X] = G (1) + G (1) (G (1)) = λ + λ λ sice e λ e λ = 1 σ = Var[X] = λ 40 S3 JUNE 016 SDB

43 Idex χ test biomial dist, 4 cotiuous uiform dist., 4 degrees of freedom, 3 discrete uiform dist., 3 geeral poits, 3 ormal dist, 6 Poisso dist., 4 Bias, 9, 37 Biased estimators, 8 bias, 38 examples, 35 Biomial distributio p.g.f. expected mea ad variace, 40 Cesus, 4 Cetral limit theorem, 15 Combiatios of radom variables expected mea of X ± Y, 3 expected variace of X ± Y, 3 idepedet ormal variables, 3 Var[X+Y], 34 E[X + Y], 33 Cofidece itervals, 16 Cotigecy tables χ test, 7 degrees of freedom, 7 Data primary data, 7 secodary data, 7 Estimators populatio mea, 11 populatio variace, 11 Lottery samplig, 5 PMCC compariso with Spearma, 3 Poisso distributio p.g.f. expected mea ad variace, 40 Probability geeratig fuctios, 39 expected mea ad variace, 39 Radom umber tables, 5 Raks equal raks, 9 Sample variace estimator of populatio variace, 11 Samplig, 4 quota samplig, 7 sample meas, 14 simple radom samplig, 5 stratified samplig, 6 systematic samplig, 5 with ad without replacemet, 6 Sigificace test zero correlatio, 31 Sigificace test variace of populatio kow differece betwee meas, 19 mea of ormal distributio, 18 Sigificace test variace of populatio NOT kow differece betwee meas, mea of ormal distributio, 1 Spearma compariso with PMCC, 3 Spearma s rak correlatio coefficiet, 9 whe to use, 30 Stadard error, 15 Ubiased estimators, 8 examples, 34 of the populatio mea, 35 of the populatio variace, 36 variace of (X + Y), 33 whe to use, 30 S3 JUNE 016 SDB 41

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. XI-1 (1074) MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. R. E. D. WOOLSEY AND H. S. SWANSON XI-2 (1075) STATISTICAL DECISION MAKING Advaced

More information

Sample questions. 8. Let X denote a continuous random variable with probability density function f(x) = 4x 3 /15 for

Sample questions. 8. Let X denote a continuous random variable with probability density function f(x) = 4x 3 /15 for Sample questios Suppose that humas ca have oe of three bloodtypes: A, B, O Assume that 40% of the populatio has Type A, 50% has type B, ad 0% has Type O If a perso has type A, the probability that they

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

z is the upper tail critical value from the normal distribution

z is the upper tail critical value from the normal distribution Statistical Iferece drawig coclusios about a populatio parameter, based o a sample estimate. Populatio: GRE results for a ew eam format o the quatitative sectio Sample: =30 test scores Populatio Samplig

More information

PRACTICE PROBLEMS FOR THE FINAL

PRACTICE PROBLEMS FOR THE FINAL PRACTICE PROBLEMS FOR THE FINAL Math 36Q Fall 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2 to

More information

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to: STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9 BIOS 4110: Itroductio to Biostatistics Brehey Lab #9 The Cetral Limit Theorem is very importat i the realm of statistics, ad today's lab will explore the applicatio of it i both categorical ad cotiuous

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE TERRY SOO Abstract These otes are adapted from whe I taught Math 526 ad meat to give a quick itroductio to cofidece

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Mathematical Statistics - MS

Mathematical Statistics - MS Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

Statistics 20: Final Exam Solutions Summer Session 2007

Statistics 20: Final Exam Solutions Summer Session 2007 1. 20 poits Testig for Diabetes. Statistics 20: Fial Exam Solutios Summer Sessio 2007 (a) 3 poits Give estimates for the sesitivity of Test I ad of Test II. Solutio: 156 patiets out of total 223 patiets

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions Chapter 9 Slide Ifereces from Two Samples 9- Overview 9- Ifereces about Two Proportios 9- Ifereces about Two Meas: Idepedet Samples 9-4 Ifereces about Matched Pairs 9-5 Comparig Variatio i Two Samples

More information

M1 for method for S xy. M1 for method for at least one of S xx or S yy. A1 for at least one of S xy, S xx, S yy correct. M1 for structure of r

M1 for method for S xy. M1 for method for at least one of S xx or S yy. A1 for at least one of S xy, S xx, S yy correct. M1 for structure of r Questio 1 (i) EITHER: 1 S xy = xy x y = 198.56 1 19.8 140.4 =.44 x x = 1411.66 1 19.8 = 15.657 1 S xx = y y = 1417.88 1 140.4 = 9.869 14 Sxy -.44 r = = SxxSyy 15.6579.869 = 0.76 1 S yy = 14 14 M1 for method

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For

More information

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency Math 152. Rumbos Fall 2009 1 Solutios to Review Problems for Exam #2 1. I the book Experimetatio ad Measuremet, by W. J. Youde ad published by the by the Natioal Sciece Teachers Associatio i 1962, the

More information

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate

More information

GG313 GEOLOGICAL DATA ANALYSIS

GG313 GEOLOGICAL DATA ANALYSIS GG313 GEOLOGICAL DATA ANALYSIS 1 Testig Hypothesis GG313 GEOLOGICAL DATA ANALYSIS LECTURE NOTES PAUL WESSEL SECTION TESTING OF HYPOTHESES Much of statistics is cocered with testig hypothesis agaist data

More information

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions We have previously leared: KLMED8004 Medical statistics Part I, autum 00 How kow probability distributios (e.g. biomial distributio, ormal distributio) with kow populatio parameters (mea, variace) ca give

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

Chapter 1 (Definitions)

Chapter 1 (Definitions) FINAL EXAM REVIEW Chapter 1 (Defiitios) Qualitative: Nomial: Ordial: Quatitative: Ordial: Iterval: Ratio: Observatioal Study: Desiged Experimet: Samplig: Cluster: Stratified: Systematic: Coveiece: Simple

More information

5. A formulae page and two tables are provided at the end of Part A of the examination PART A

5. A formulae page and two tables are provided at the end of Part A of the examination PART A Istructios: 1. You have bee provided with: (a) this questio paper (Part A ad Part B) (b) a multiple choice aswer sheet (for Part A) (c) Log Aswer Sheet(s) (for Part B) (d) a booklet of tables. (a) I PART

More information

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACATÓLICA Quatitative Methods Miguel Gouveia Mauel Leite Moteiro Faculdade de Ciêcias Ecoómicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACatólica 006/07 Métodos Quatitativos

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman: Math 224 Fall 2017 Homework 4 Drew Armstrog Problems from 9th editio of Probability ad Statistical Iferece by Hogg, Tais ad Zimmerma: Sectio 2.3, Exercises 16(a,d),18. Sectio 2.4, Exercises 13, 14. Sectio

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Statistical Properties of OLS estimators

Statistical Properties of OLS estimators 1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notatio Math 113 - Itroductio to Applied Statistics Name : Use Word or WordPerfect to recreate the followig documets. Each article is worth 10 poits ad ca be prited ad give to the istructor

More information

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times Sigificace level vs. cofidece level Agreemet of CI ad HT Lecture 13 - Tests of Proportios Sta102 / BME102 Coli Rudel October 15, 2014 Cofidece itervals ad hypothesis tests (almost) always agree, as log

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 017 MODULE 4 : Liear models Time allowed: Oe ad a half hours Cadidates should aswer THREE questios. Each questio carries

More information

Final Review for MATH 3510

Final Review for MATH 3510 Fial Review for MATH 50 Calculatio 5 Give a fairly simple probability mass fuctio or probability desity fuctio of a radom variable, you should be able to compute the expected value ad variace of the variable

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion 1 Chapter 7 ad 8 Review for Exam Chapter 7 Estimates ad Sample Sizes 2 Defiitio Cofidece Iterval (or Iterval Estimate) a rage (or a iterval) of values used to estimate the true value of the populatio parameter

More information

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued) Worksheet 3 ( 11.5-11.8) Itroductio to Simple Liear Regressio (cotiued) This worksheet is a cotiuatio of Discussio Sheet 3; please complete that discussio sheet first if you have ot already doe so. This

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

Common Large/Small Sample Tests 1/55

Common Large/Small Sample Tests 1/55 Commo Large/Small Sample Tests 1/55 Test of Hypothesis for the Mea (σ Kow) Covert sample result ( x) to a z value Hypothesis Tests for µ Cosider the test H :μ = μ H 1 :μ > μ σ Kow (Assume the populatio

More information

Topic 10: Introduction to Estimation

Topic 10: Introduction to Estimation Topic 0: Itroductio to Estimatio Jue, 0 Itroductio I the simplest possible terms, the goal of estimatio theory is to aswer the questio: What is that umber? What is the legth, the reactio rate, the fractio

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9 Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I

More information

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700 Class 7 Daiel B. Rowe, Ph.D. Departmet of Mathematics, Statistics, ad Computer Sciece Copyright 013 by D.B. Rowe 1 Ageda: Skip Recap Chapter 10.5 ad 10.6 Lecture Chapter 11.1-11. Review Chapters 9 ad 10

More information

Chapter 12 Correlation

Chapter 12 Correlation Chapter Correlatio Correlatio is very similar to regressio with oe very importat differece. Regressio is used to explore the relatioship betwee a idepedet variable ad a depedet variable, whereas correlatio

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018 HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018 We are resposible for 2 types of hypothesis tests that produce ifereces about the ukow populatio mea, µ, each of which has 3 possible

More information

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet

More information

MATH/STAT 352: Lecture 15

MATH/STAT 352: Lecture 15 MATH/STAT 352: Lecture 15 Sectios 5.2 ad 5.3. Large sample CI for a proportio ad small sample CI for a mea. 1 5.2: Cofidece Iterval for a Proportio Estimatig proportio of successes i a biomial experimet

More information

Describing the Relation between Two Variables

Describing the Relation between Two Variables Copyright 010 Pearso Educatio, Ic. Tables ad Formulas for Sulliva, Statistics: Iformed Decisios Usig Data 010 Pearso Educatio, Ic Chapter Orgaizig ad Summarizig Data Relative frequecy = frequecy sum of

More information

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes. Term Test 3 (Part A) November 1, 004 Name Math 6 Studet Number Directio: This test is worth 10 poits. You are required to complete this test withi miutes. I order to receive full credit, aswer each problem

More information

Statistics Revision Solutions

Statistics Revision Solutions Statistics Revisio Solutios (i) H ~N (00, ) ad W ~N (7, 9 ) P ( 7. 0) 0. 978 P (iii) H + W ~N (7, ) P ( H + W > A) > 0.9 P( H + W < A) < 0.0 A< ivnorm(0.0,

More information

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes. Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

More information

Pearson Edexcel Level 3 Advanced Subsidiary and Advanced GCE in Statistics

Pearson Edexcel Level 3 Advanced Subsidiary and Advanced GCE in Statistics Pearso Edecel Level 3 Advaced Subsidiary ad Advaced GCE i Statistics Statistical formulae ad tables For first certificatio from Jue 018 for: Advaced Subsidiary GCE i Statistics (8ST0) For first certificatio

More information

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Radom Samplig The basic idea of the statistical iferece is that we are allowed to draw ifereces or coclusios about a populatio based

More information

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2 Chapter 8 Comparig Two Treatmets Iferece about Two Populatio Meas We wat to compare the meas of two populatios to see whether they differ. There are two situatios to cosider, as show i the followig examples:

More information

PRACTICE PROBLEMS FOR THE FINAL

PRACTICE PROBLEMS FOR THE FINAL PRACTICE PROBLEMS FOR THE FINAL Math 36Q Sprig 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2

More information

MA238 Assignment 4 Solutions (part a)

MA238 Assignment 4 Solutions (part a) (i) Sigle sample tests. Questio. MA38 Assigmet 4 Solutios (part a) (a) (b) (c) H 0 : = 50 sq. ft H A : < 50 sq. ft H 0 : = 3 mpg H A : > 3 mpg H 0 : = 5 mm H A : 5mm Questio. (i) What are the ull ad alterative

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

1 Models for Matched Pairs

1 Models for Matched Pairs 1 Models for Matched Pairs Matched pairs occur whe we aalyse samples such that for each measuremet i oe of the samples there is a measuremet i the other sample that directly relates to the measuremet i

More information

Confidence Intervals for the Population Proportion p

Confidence Intervals for the Population Proportion p Cofidece Itervals for the Populatio Proportio p The cocept of cofidece itervals for the populatio proportio p is the same as the oe for, the samplig distributio of the mea, x. The structure is idetical:

More information

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3 Itroductio to Ecoometrics (3 rd Updated Editio) by James H. Stock ad Mark W. Watso Solutios to Odd- Numbered Ed- of- Chapter Exercises: Chapter 3 (This versio August 17, 014) 015 Pearso Educatio, Ic. Stock/Watso

More information

(6) Fundamental Sampling Distribution and Data Discription

(6) Fundamental Sampling Distribution and Data Discription 34 Stat Lecture Notes (6) Fudametal Samplig Distributio ad Data Discriptio ( Book*: Chapter 8,pg5) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye 8.1 Radom Samplig: Populatio:

More information

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6) STAT 350 Hadout 9 Samplig Distributio, Cetral Limit Theorem (6.6) A radom sample is a sequece of radom variables X, X 2,, X that are idepedet ad idetically distributed. o This property is ofte abbreviated

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018) Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

Combining. random variables

Combining. random variables 7 Combiig radom variables If you kow the average height of a brick, the it is fairly easy to guess the average height of two bricks, or the average height of half of a brick. What is less obvious is the

More information

1036: Probability & Statistics

1036: Probability & Statistics 036: Probability & Statistics Lecture 0 Oe- ad Two-Sample Tests of Hypotheses 0- Statistical Hypotheses Decisio based o experimetal evidece whether Coffee drikig icreases the risk of cacer i humas. A perso

More information

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01 ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly

More information

TAMS24: Notations and Formulas

TAMS24: Notations and Formulas TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =

More information

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators. IE 330 Seat # Ope book ad otes 120 miutes Cover page ad six pages of exam No calculators Score Fial Exam (example) Schmeiser Ope book ad otes No calculator 120 miutes 1 True or false (for each, 2 poits

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information