STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially have to verify whether a populatio parameter could be equal to a certai proposed value. Sice othig is kow about the parameter, iformatio has to be gathered from the populatio by selectig a sample which is as ubiased as possible. This sample should, as far as possible, cotai most, if ot all, the characteristics of its paret populatio. I other words, it must be the best possible represetative of the populatio from which it comes. From samplig theory, we lear that the most appropriate method of samplig depeds etirely o the structure of the populatio. We will assume that, to the best of our ability, we choose the most ubiased sample. It is worth metioig, however, that it is very hard, if ot impossible, to obtai the ideal sample because we ca ever elimiate samplig errors completely. ONE-SAMPLE TESTING If, for istace, we were required to test whether the populatio mea μ could be equal to a certai value μ, it would be quite atural to select a sample ad determie the value of the poit estimate of the populatio mea, that is, the sample mea x, so as to have a approximate idea of the value of the populatio mea (read the chapter o Estimatio. The whole problem the boils dow to checkig how far x is from μ. Far is subjective, that is, if μ 5, the someoe may fid that x 47 is far from 5 but someoe else may fid that 47 is relatively close to 5. We have to remember that if x 47, we caot automatically coclude that the populatio mea ca defiitely ot be equal to 5 just because 47 is ot umerically equal to 5. There exist samplig errors which could have caused the sample mea to deviate from the true value of the populatio mea. The factor which determies how far or ear x is from μ is the sigificace level α of the test. Before goig ito the details of the problem, let us first become familiar with some terms ad their respective otatios. Whe we have to test whether a parameter could be equal to a proposed value, the proposal is formulated as a hypothesis kow as the ull hypothesis deoted by. For example, if we have to test whether the populatio mea is equal to 5, we would write : μ 5. Thus, a ull hypothesis is just a formal statemet where the parameter is equated to the proposed value.
I ay testig procedure, the priciple is to assume that the ull hypothesis is true. O the basis of iformatio obtaied from a sample, we shall later o decide to accept or reject it. I the case of rejectio, it is importat that we give a more precise aswer tha the populatio mea is ot equal to 5. Statistically speakig, oe will be more satisfied if the aswer is the populatio mea is less tha (or more tha 5 i the case whe is rejected. This is the reaso why every ull hypothesis should always be accompaied by a alterative hypothesis, deoted by. It must be esured that ad be mutually exclusive so that the acceptace of oe implies the automatic rejectio of the other. To every ull hypothesis, there exist three possible alteratives. For example, if : μ 5, the. : μ < 5. : μ > 5 3. : μ 5 are the possible alteratives. The choice of the correct alterative is made accordig to the formulatio of the problem. The sigificace level α of the test is also kow as the critical regio or the regio of rejectio of ad its locatio depeds o the choice of. I most cases, we select large samples for the sake of accuracy i our estimatio of the populatio parameter. Cosequetly, we may make a extesive use of the ormal distributio theory as postulated by the Cetral Limit Theorem. I the above example, alteratives ( ad ( are kow as oe-sided or oetailed alteratives whereas the third oe is called a two-sided or a two-tailed alterative. The term tailed obviously comes from the lower ad upper tails of the ormal distributio. We ow show how the locatio of the critical regio fluctuates with the choice of the alterative hypothesis. : μ : μ < 5 5 α μ Fig..
: μ 5 : μ 5 > α μ Fig.. : μ 5 : μ 5 α α μ Fig..3 I the followig diagram, the acceptace ad rejectio regios are show for a oe-tailed alterative to the right. Note that the critical value is foud at the boudary of these two regios. For a two-tailed alterative, there will be two critical regios ad hece two critical values. α (critical regio or regio of rejectio of Accept μ 5 : μ 5 : μ 5 > Fig..4 critical value 3
. Testig procedure The followig steps may be used as a guidelie durig ay testig procedure. owever, we have to bear i mid that differet sample statistics are required whe testig for differet populatio parameters.. Formulate the ull ad alterative hypotheses. Depedig o the alterative hypothesis ad the sigificace level, defie the 3. Perform the test-statistic, icludig ay other relevat calculatios 4. Compare the test-statistic value with the critical value(s i order to decide whether to accept or reject the ull hypothesis. Write dow a coclusio i the cotext of the problem. The test-statistic varies accordig to the parameter for which we are testig. I geeral, it is iterestig to kow that, wheever we use the z-test, that is, the ormal distributio, the test-statistic is of the form z X E[ X ] var[ X ] where X is the ubiased poit estimator of the parameter uder ivestigatio.. Testig for the populatio mea the z-test Whe testig for the populatio mea μ, we first have to check whether the populatio variace is kow. This is because the test-statistic depeds o this vital factor. If is kow, the, o matter how large the sample size is, we use the z-test. If is ukow, we have to take the sample size ito cosideratio. If is large (greater tha 3, we still use the z-test. The flowchart i Fig... below illustrates the procedure o how to use the appropriate test-statistic for a give situatio i oe-sample mea testig. Is kow? Y x μ z N Is large? Y x μ z ˆ Fig... 4
Example (Populatio variace kow The legth of strigs i the balls of strig made by a particular maufacturer has mea μ m ad variace 7.4 m. The maufacturer claims that μ 3. A radom sample of balls of strig is take ad the sample mea is foud to be 99. m. Test whether this provides sigificat evidece, at the 3% level, that the maufacturer s claim overstates the value of μ. Solutio The claim is that the populatio mea is 3. Thus, we start by formulatig our ull hypothesis accordig to the maufacturer s claim. Now, we have to check whether this is a overstatemet, that is, whether the true value of the mea is i fact less tha what is stated i the claim. ece, we are testig : μ 3 agaist : μ 3 < This is a oe-tailed alterative to the left ad the sigificace level is 3%. Sice the populatio variace is kow, we will use the z-test..3.88.58 3 Fig... The diagram above shows the critical regio (shaded uder the ull hypothesis that the populatio mea is 3. The critical z-value is.88 as obtaied from the stadard ormal table. We kow that the sample size is ad that the sample mea x is 99. m. By usig the test-statistic x μ 99. 3 z, we obtai z. 58. 7.4 Sice.58 >.88 (idicated by a ecircled cross i Fig..., we accept ad coclude that the maufacturer was ot overstatig the value of the populatio mea legth of strig. 5
Example (Populatio variace ukow large sample A supermarket maager ivestigated the legths of time that customers spet shoppig i the store. The time, x miutes, spet by each of a radom sample of 5 customers was measured, ad it was foud that x 87 ad x 69. Test, at the 5% level of sigificace, the hypothesis that the mea time spet shoppig by customers is miutes agaist the alterative that it is less tha this. Solutio The statemet is that the populatio mea is miutes ad we have bee give the alterative hypothesis that the mea is less tha miutes. ece, we are testig : μ agaist : μ < This is a oe-tailed alterative to the left ad the sigificace level is 5%. Sice the populatio variace is ukow, we have to check the size of the sample. The value of is 5, which is statistically cosidered to be large ( > 3, so that will use the z-test..5.84.645 Fig...3 The diagram above shows the critical regio (shaded uder the ull hypothesis that the populatio mea is. The critical z-value is.645 as obtaied from the stadard ormal table. We start by calculatig the values of x ad ˆ as required i the teststatistic sice, this time, the populatio variace is ukow. From formulae, 87 x 9.4 ad 5 69 (9.4 ˆ 34. 8. 5 49 5 6
x μ 9.4 Usig the test-statistic z, we obtai z. 84. ˆ 34.8 5 Sice.84 <.645, (idicated by a ecircled cross i Fig...3, we reject ad coclude that the mea time spet shoppig by customers is less tha miutes..3 Testig for the populatio proportio We ofte wat to kow the proportio of idividuals i a populatio which satisfy a certai characteristic. For example, it would be iterestig to kow the percetage of left-haded people i Mauritius or the proportio of books i a library which cotai more tha 5 pages. As usual, it will be assumed that the populatio is ifiite so that iformatio may oly be obtaied by selectig a sample. The populatio proportio is deoted by p. I geeral, whe we select idividuals, they either satisfy or do ot satisfy the characteristic uder ivestigatio. If it ever happes that a idividual falls i both categories simultaeously (for example, someoe ambidextrous, the that idividual is automatically discarded for the sake of calculatios. It is thus quite atural to use the biomial distributio because each idividual will either be labelled as success or failure, depedig o whether it satisfies the characteristic or ot. If we wat to have a idea of the value of p, we select a sample of size ad cout the umber, x, of idividuals satisfyig the required characteristic. We have leart from the chapter o Estimatio that the sample proportio x p ( p is a ubiased estimator of the populatio p ad has variace. By the p( p Cetral Limit Theorem, for large samples, ~ N p,. I this course, we will ot cosider testig for oe-sample proportios by meas of the biomial or Poisso distributios but rather their approximatios by the ormal distributio (without cotiuity correctio. The test-statistic to be used will therefore be z p p( p The testig procedure will be idetical to that used for testig for a oesample mea except, precisely, for the test-statistic. 7
Example I a public opiio poll, radomly chose electors were asked whether they would vote for the Purple Party at the ext electio ad 357 replied Yes. The leader of the Purple Party believes that the true percetage of electors who would vote for his party is.4. Test at the 8% level whether he is overestimatig his support. Solutio The leader s belief is formulated as the ull hypothesis : p. 4. Sice we wat to kow whether he is exaggeratig, we have to check if, i fact, the percetage of the populatio supportig his party is less tha 4%. Thus, the alterative hypothesis is : p. 4. < Sice the sample size is large (, we use the ormal approximatio to the biomial distributio with p.4 4 ad variace p ( p.4.6 4. Furthermore, the sample proportio 357 p ˆ.357..8.743.46.4 Fig..3. p.357.4 The statistic value is z.743. p( p (.4(.6 Sice.743 <.46, reject ad coclude that the Purple Party leader was ideed overestimatig his support. 8
3 TWO-SAMPLE TESTING The approach to two-sample hypothesis testig is idetical to its oesample couterpart except that, i this case, there is o proposed value. Compariso is made directly betwee the values of two populatio parameters - i geeral, we test for equality betwee these parameters (meas or proportios. Oe very importat aspect to be cosidered from the statisticia s poit of view is that, wheever we test for the differece betwee two populatio meas, it is compulsory to test for equality of the populatio variaces first. This is because the choice of the test-statistic depeds o the fact that the variaces are equal or ot. This coditio, however, is ot applicable whe we test for equality of populatio proportios. 3. TESTING FOR EQUALITY BETWEEN MEANS It could sometimes prove to be essetial to compare the meas of two distributios before makig a importat decisio. For example, we might wish to verify whether the mea lifespa of wome is loger tha that of me i geeral. Otherwise, we may be tempted to check whether there has bee a improvemet i the umber of marks of studets who have bee through a itesive traiig for a certai period. As will be see later, differet types of testig, ad hece, test-statistics, would be used for these two cases. But first, let us examie each case i detail ad the illustrate it by meas of a example. Large idepedet samples Whe testig for equality of meas for two idepedet populatios, we start by selectig a sample from each populatio. For the purpose of this course, we will assume that the variaces of the two populatios are equal. It is the just a matter of testig whether the differece betwee the two sample meas is statistically sigificat. If the samples are large ( > 3, the, accordig to the Cetral Limit Theorem, we may use the ormal distributio theory oce more i determiig the test statistic to be used. If X ad X are two idepedet ormal variables such that X ~ N( μ, ad X ~ N( μ,, the, usig the laws of expectatio ad variace, the differece betwee these variables, X X, will also follow a ormal distributio with expectatio ad variace calculated as follows: E [ X μ μ var[. X ] E[ X ] E[ X ] X X ] var[ X ] var[ X ] 9
We thus write ( X X ~ N( μ μ,. Sice the respective sample meas x ad x, accordig to the Cetral Limit Theorem, are ormally distributed such that X ~ N μ, ad X ~ N μ,, followig the same procedure as i oe-sample testig, we deduce that the test-statistic should be ( x z μ ( x ( x x ( μ μ μ where ad are the sample sizes (ot ecessarily of the same size from each populatio respectively. If the populatio variaces ad are ukow, we replace them by their respective ubiased estimates ˆ ad ˆ. Example A compay has two regioal head offices i Machester ad Glasgow. Workers i the Glasgow office claim that they are paid less tha the workers i the Machester office. To test this claim, a researcher takes a radom sample of workers from each office. The followig set of data is recorded: Machester Glasgow Sample size Mea salary 5 7 5 Stadard deviatio Table 3.. Usig a 5% level of sigificace, test the claim that the Glasgow workers are paid lower salaries o average. Solutio Let us deote Machester ad Glasgow as populatios ad respectively, hece the subscripts for meas, variaces ad sample sizes. We formulate the ull ad alterative hypotheses as follows: μ μ : μ μ ( ( μ μ > agaist : μ > μ
Sice the sample sizes ( are statistically cosidered as large, we use the ormal distributio. The critical value correspodig to a sigificace level of.5 is.645 from the stadard ormal table. ****************************************************************** Note that, sice the populatio variaces are ukow, we shall replace them by s their ubiased estimators. It is iterestig to ote that ˆ is equivalet to ˆ s. This relatioship is very useful i the sese that we o more have to compute the ubiased estimates but ca use the sample stadard deviatios themselves directly i the test-statistic. ****************************************************************** Accept.5 Fig. 3...645.4 Usig the iformatio give i Table 3.., the test-statistic value is ( x z x ( μ μ ( x x ( μ μ s s ( ( (57 5 ( ( 99 ( 99.4 Sice.4 >.645 (see Fig. 3.., we reject ad coclude that Machester workers ideed get paid higher salaries tha their Glasgow couterparts.
3. TESTING FOR EQUALITY BETWEEN PROPORTIONS We recall that, for a oe-sample test for the populatio proportio p, its ubiased estimator (sample proportio followed a ormal distributio with mea p ( p p ad variace accordig to the Cetral Limit Theorem. If we exted this theory to two-sample testig, that is, whe testig for equality of two populatio proportios p ad p, a sample will be selected from each populatio ad its sample proportio determied. Assumig that samples of sizes ad are chose from populatios ad ad that x ad x observatios are foud that satisfy the characteristic uder ivestigatio from the respective populatios, the the sample proportios will be x x p ˆ ad p ˆ. Thus, p p( p ˆ ~ N p, ad p ( p p ˆ ~ N p, accordig to the Cetral Limit Theorem. Sice the liear combiatio of two ormal distributios is also a ormal distributio, we determie the distributio of ˆ as follows: ( p E[ ˆ p ] p p ad p( p var[ ˆ p ] p ( p Therefore, ( p ˆ ~ N p p, p( p p ( p The test-statistic for two-sample testig for equality for proportios is z ( ( p ˆ ( p p ( ( ˆ ( p ( sice. The ull hypothesis is give by p p ( p p :. The values of p ad p are ukow, hece estimatios beig used for the variaces.
Example To verify the percetages of me ad wome who are IV positive i a certai commuity, two samples were selected ad the followig iformatio was recorded: Me Wome Sample size 55 7 Number of IV positive 86 396 Table 3.. Ca we coclude, at the 5% sigificace level, that the proportios of IV positive me ad wome i the commuity are equal? Solutio Sice we are oly testig whether the proportios are equal, there is o specific directio, that is, we use a two-tailed alterative hypothesis. : p p ( p p p p ( p p : Deotig the me ad wome populatios by ad respectively, the recorded data may be summarised as 55 x 86 86. 5 p ˆ 55 7 396 396. 55 p ˆ 7.5.5.96.6.96 Fig. 3.. 3
The test-statistic value is z ( ( ˆ p ( (.5.55 (.5(.48 55 (.55(.45 7.6 Sice.96 <.6 <.96 (see Fig. 3.. above, we do ot have eough evidece to reject. We coclude that the proportios of IV positive me ad wome i that commuity are equal. 4